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Abstract. Motivated by applications in quantum chemistry and solid state physics, we apply 
general results from approximation theory and matrix analysis to the study of the decay properties of 
spectral projectors associated with large and sparse Hermitian matrices. Our theory leads to a rigor- 
ous proof of the exponential off-diagonal decay ( "nearsightedness" ) for the density matrix of gapped 
vN systems at zero electronic temperature in both orthogonal and non-orthogonal representations, thus 

^~^ providing a firm theoretical basis for the possibility of linear scaling methods in electronic structure 

C_^ calculations for non-metallic systems. We further discuss the case of density matrices for metallic 

Cm systems at positive electronic temperature. A few other possible applications are also discussed. 
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1. Introduction. The physical and chemical properties of raaterials are largely 
determined by the electronic structure of the atoms and molecules found in them. 
In all but the simplest cases, the electronic structure can only be determined ap- 
proximately, and since the late 1920's a huge amount of work has been devoted to 
i^H finding suitable approximations and numerical methods for solving this fundamental 

Cd problem. Traditional methods for electronic structure computations arc based on the 

H solution of generalized eigenvalue problems ( "diagonalization" ) for a sequence of large 

I— ^ Hermitian matrices, known as one-particle Hamiltonians. The computational cost of 

, this approach scales cubically in the size n of the problem, which is in turn deter- 

1^ mined by the number of electrons in the system. For large systems, the costs become 

(^ prohibitive; this is often referred to as "the 0{n^) bottleneck" in the literature. 

l/~J In the last two decades, a number of researchers have been developing approaches 

O^ that are capable in many cases to achieve "optimal" computational complexity: the 

computational effort scales linearly in the number of electrons, leading to better per- 
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CO formance for sufficiently large systems and making the electronic structure problem 



o 

(N 



tractable for large-scale systems. These methods, often referred to as ^^0{n) meth- 
ods," apply mostly to insulators. They avoid diagonalization by computing instead 
• • the density matrix, a matrix which encodes all the important physical properties of 

. ^ the system. For insulators at zero temperature, this is the spectral projector onto the 

S^ invariant subspace associated with the eigenvalues of the Hamiltonian falling below 

^ a certain value. For systems at positive temperatures, the density matrix can be 

expressed as a smooth function of the Hamiltonian. 

The possibility of developing such methods rests on a deep property of electronic 
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matter, called "nearsightedness" by W. Kohn [75]. Kohn's "Nearsightedness Princi- 
ple" expresses the fact that for a large class of systems the effects of disturbances, or 
perturbations, remain localized and thus do not propagate beyond a certain (finite) 
range; in other words, far away parts of the system do not "see" each other. Math- 
ematically, this property translates into the rapid off-diagonal decay in the density 
matrix. The fast fall-off in the density matrix entries has been often assumed with- 
out proof, or proved only in special cases. Moreover, the precise dependence of the 
rate of decay on properties of the system (such as the band gap in insulators or the 
temperature in metallic systems) has been the subject of much discussion. 

The main goal of this paper is to provide a rigorous mathematical foundation for 
linear scaling methods in electronic structure computations. We do this by deriving 
estimates, in the form of decay bounds, for the entries of general density matrices for 
insulators, and for metallic systems at positive electronic temperatures. We also ad- 
dress the question of the dependence of the rate of decay on the band gap and on the 
temperature. Although immediately susceptible of physical interpretation, our treat- 
ment is purely mathematical. By stripping the problem down to its essential features 
and working at the discrete level, we are able to develop an abstract theory cover- 
ing nearly all types of systems and discretizations encountered in actual electronic 
structure problems. 

Our results are based on a general theory of decay for the entries in analytic 
functions of sparse matrices, initially proposed in [T2j [Ml I106J and further developed 
here. The theory is based on classical approximation theory and matrix analysis. A 
bit of functional analysis will be used when considering a simple model of "metallic 
behavior," for which the decay in the density matrix is very slow. 

The approach described in this paper has a number of potential applications be- 
yond electronic structure computations, and can be applied to any problem involving 
functions of large matrices where "locality of interaction" plays a role. Towards the 
end of the paper we briefly review the possible use of decay bounds in the study of 
correlations in quantum statistical mechanics and information theory, in the analysis 
of complex networks, and in some problems in classical numerical linear algebra, like 
the computation of invariant subspaces of symmetric tridiagonal matrices. The dis- 
cussion of these topics will be necessarily brief, but we hope it will stimulate further 
work in these areas. 

In this paper we are mostly concerned with the theory behind 0{n) methods 
rather than with specific algorithms. Readers who are interested in the computational 
aspects should consult any of the many recent surveys on algorithms for electronic 
structure computations; among these, [201 E3 1113|, 1116) are especially recommended. 

The remainder of the paper is organized as follows. Section [2] provides some 
background on electronic structure theory. The formulation of the electronic struc- 
ture problem in terms of spectral projectors is reviewed in section [S} A survey of 
previous, related work on decay estimates for density matrices is given in section |4l 
In section [s] we formulate our basic assumptions on the matrices (discrete Hamil- 
tonians) considered in this paper, particularly their normalization and asymptotic 
behavior for increasing system size (n — > oo). The approximation (truncation) of 
matrices with decay properties is discussed in section [61 A few general properties of 
orthogonal projectors are established in section[7l The core of the paper is represented 
by section ^ where various types of decay bounds for spectral projectors are stated 
and proved. In section [9] we discuss the transformation to an orthonormal basis set. 



The case of vanishing gap is discussed in section 10 Other applications of our results 
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and methods are mentioned in section [TT] Finally, concluding remarks and some open 
problems are given in section [T2| 

2. Background on electronic structure theories. In this section we briefly 
discuss the basic principles underlying electronic structure theory. For additional 
details the reader is referred to, e.g., [^ 1771 155 1 P I ITTHl [HS] . 

Consider a physical system formed by a number of nuclei and Uf, electrons in 
three-dimensional (3D) space. The time-independent Schrodinger equation for the 
system is the eigenvalue problem 

Htot*tot = ^tot^tot, (2.1) 

where 'Htot is the many-body Hamiltonian operator, _Etot is the total energy and the 
functions ^I^tot are the eigenstates of the system. 

The Born-Oppenheimer approximation allows us to separate the nuclear and elec- 
tronic coordinates. As a consequence, we only seek to solve the quantum mechanical 
problem for the electrons, considering the nuclei as sources of external potential. Then 



the electronic part of equation (2.1 1 can be written as 

-H* = ^*, (2.2) 

where E is the electronic energy and the eigenstates VE* are functions of 3ne spatial 
coordinates and rie (discrete) spin coordinates. 

We denote spatial coordinates as r and the spin coordinate as cr; each electron 

is then defined by 3 -I- 1 coordinates x,j = * I , and wavefunctions are denoted as 



^(xi, . . . ,x„^). Then the electronic Hamiltonian operator in (2.2) can be written as 

H = T + Kxt+Ke, 

where T = — |V^ is the kinetic energy, V^xt is the external potential (i.e., the potential 
due to the nuclei) and V^c = \ Y^^lj \r~r\ ^^ ^^^ potential due to the electron-electron 
repulsionjj Moreover, the ground-state energy is given by 

£;o = min(*|-H|*), 

where the minimum is taken over all the normalized antisymmetric wavefunctions 
(electrons being Fermions, their wavefunction is antisymmetric). The electronic den- 
sity is defined as 

p{v) = rie^ / dx2 . . . / dx„J*(r,cr,X2, . . . ,x„Jp. 

In this expression, the sum over a is the sum over the spin values of the first electron, 
while integration with respect to x^, with 2 < i < n^, denotes the integral over M'^ 
and sum over both possible spin values for the ith electron. 



Observe that (2.2 1 is a many-particle equation that cannot be separated into 



several one-particle equation because of the term 140- Of course, being able to turn 



(2.2) into a separable equation would simplify the problem considerably, since the 

^As is customary in physics, we use here atomic units, that is, e^ = h = m = 1, with e = 
electronic charge, h = reduced Planck's constant and m = electronic mass. 
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number of unknowns per equation would drop from Srig + 
motivation for one-electron methods. 

For non-interacting particles, the many-body eigenstates ^(xi,. 
written as Slater determinants of occupied orbitals 01 (xi), . . . , 4>n^{^' 



to 3 + 1. This is the 



J, 



, x„_, ) can be 



*(xi, 



J Xjlc j 



0i(xi) 



01 (Xn, ) 



= (xi) 



?^ne v^n 



where each orbital satisfies a single-particle eigenstate equation 'Hi4>i — Ei(j)i. In gen- 
eral, the name "one-particle method" is used also when self-consistent terms (e.g., 
involving the density) are present in Jif, in this case, the equations are solved iter- 
atively, computing at each step the solution to a single-particle problem and then 
filling the lowest eigenstates with one electron each, to form a Slater determinant. 
However, some of the properties of a true non-interacting system (such as the fact 
that the energy is the sum of the eigenvalues of occupied states) are lost. 

A fundamental example of one-particle method is density functional theory (DFT) . 
The main idea behind DFT consists in rewriting the ground-state energy as a density 
functional rather than a wavefunction functional. Indeed, the first Hohenberg-Kohn 
theorem [66] states that the potential is uniquely (up to a constant) determined by 
the ground-state density p(r). In other words, the system can be seen as character- 
ized by the density rather than by the potential. Moreover, the ground-state density 
of a system with given external potential can be computed by minimizing a suitable 
energy functional of p (second Hohenberg-Kohn theorem). 

While of crucial theoretical importance, though, these results do not give a recipe 
for computing electronic structures. The next important step comes with the Kohn- 
Sham construction [76] : roughly speaking, one replaces the original, non-separable 
system with a fictitious system of non-interacting electrons that have exactly the 
same density as the original system. The single-particle equations for the Kohn-Sham 
system are (neglecting spin): 



+ y(r)U,(r)=e,V^(r), 



where the ipi's are the Kohn-Sham orbitals and V(r) is the single-electron potential. 
The associated density is 



p(r)=£|^,(r)|^ 



The single-particle potential V(r) can be written as 

p(r 



y(r) = T/ext(r) + 



-dr' + V^Mir), 



where the term 14;c[p](r) is called exchange-correlation potential and depends on the 
density. It is important to point out that the Kohn-Sham construction is not an ap- 
proximation, in that the Kohn-Sham equations are exact and yield the exact density. 
On the other hand, the exchange-correlation energy is not known in practice and 
needs to be approximated. In the local density approximation (LDA) framework, 
for instance, the exchange energy is based on the energy of a uniform electron gas. 
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Introducing spin allows for a more refined approximation (LSDA, or local spin-density 
approximation). One may also include gradient corrections, thus obtaining the so- 
called generalized gradient approximation (GGA). 

The solution of Kohn-Sham equations is usually computed via self-consistent it- 
erations. The iterative process begins with an approximation for the density; the 
associated approximate exchange-correlation potential is injected in the Kohn-Sham 
equations. The output density is then used to form a new approximation of the 
potential. The process continues until the update term for the density or the poten- 
tial becomes negligible. Observe that the basic building block of this computational 
technique is the solution of an eigenvalue problem for non-interacting particles. 

Electrons at the lowest atomic-like levels ('core' electrons) do not change much 
their state within chemical processes. For this reason, many computational techniques 
do not consider them explicitly, and replace instead the Coulomb attraction of the 
nucleus with a potential (called pseudopotential) that includes the effect of the core 
electrons on the valence electrons. This approach is always employed when using 
plane waves as a basis for wavefunctions, since the number of plane waves required to 
represent core electrons is prohibitive. 

3. Density matrices. As mentioned earlier, conventional methods for electronic 
structure calculations require the repeated solution of linear eigenvalue problems for 
a one-electron Hamiltonian operator of the form Ti. = — ^V^ -I- V{r). In practice, 
operators are discretized by grid methods or via Galerkin projection onto the finite- 
dimensional subspace spanned by a set of basis functions {4>i}f^i- When linear combi- 
nations of atom-centered Slater or Gaussian-type functions (see below) are employed, 
the total number of basis functions is n « nf, • rig, where rig is the number of (valence) 
electrons in the system and n;, is a small or moderate integer related to the number of 
basis functions per atom. Traditional electronic structure algorithms diagonalize the 
discrete Hamiltonian resulting in algorithms with 0{n^) (equivalently, 0(n'^)) opera- 
tion count [771 IMl [TTB] . In these approaches, a sequence of generalized eigenproblems 
of the form 

H^j,^e,S^^, l<i<n„ (3.1) 

is solved, where H and S are, respectively, the discrete Hamiltonian and the overlap 



matrix relative to the basis set {4>i}^^i- The eigenvectors -ijji in (3.1 1 are known as the 
occupied states, and correspond to the Hf. lowest generalized eigenvalues ei < • • • < 
En^ , the occupied levels. The overlap matrix S is just the Gram matrix associated 
with the basis set: Sij = {(j)j,(j)i) for all i,j, where (•, •) denotes the standard L^-inner 
product. In Dirac's hra-ket notation, which is the preferred one in the physics and 
chemistry literature, one writes Sij = {4ii\4ij). For an orthonormal basis set, S = In 



(the n X n identity matrix) and the eigenvalue problem (3.1 ) is a standard one. 

Instead of explicitly diagonalizing the discretized Hamiltonian H, one may refor- 
mulate the problem in terms of the density operator P, which is the 5'-orthogonal 
pro jectoirl onto the iif-invariant subspace corresponding to the occupied states, that 



is, the subspace spanned by the ne eigenvectors ipi in (3.1 1. Virtually all quanti- 
ties of interest in electronic structure theory can be computed as functionals of the 
density matrix P; see, e.g., [MIIMIIST]. It is this reformulation of the problem that 
allows for the development of potentially more efficient algorithms for electronic struc- 
ture, including algorithms that asymptotically require only 0{ne) (equivalently, 0{n)) 



That is, orthogonal with respect to the inner product associated with S. 
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arithmetic operations and storage. Most current methodologies, including Hartree- 
Fock, Density Functional Theory (e.g., Kohn-Sham), and hybrid schemes (like BLYP) 
involve self-consistent field (SCF) iterations, in which the density matrix P must be 
computed at each SCF step, typically with increasing accuracy as the outer iteration 
converges; see, e.g., [77lll37j . 

As stated in section [l] in this paper we use some classical results from polynomial 
approximation theory and matrix analysis to provide a mathematical foundation for 
linear scaling electronic structure calculations for a very broad class of systems. We 
assume that the basis functions (j)i are localized, i.e., decay rapidly outside of a small 
region. Many of the most popular basis sets used in quantum chemistry, such as 
Gaussian- type orbitals (GTO), which are functions of the form 

(l){x,y,x) = Cx"^y"yz''^e-'"'' , 

where C is a normalization constant, satisfy this requirement [77]. For systems with 
sufficient separation between atoms, this property implies a fast off-diagonal decay 
of the entries of the Hamiltonian matrix; moreover, a larger distance between atoms 
corresponds to a faster decay of matrix entries [771 page 381]. If the entries that fall 
below a given (small) truncation tolerance are set to zero, the Hamiltonian turns out 
to be a sparse matrix. 

Decay results are especially easy to state in the banded casejjbut more general 
sparsity patterns will be taken into account as well. 

We can also assume from the outset that the basis functions form an orthonormal 
set. If this is not the case, we perform a congruence transformation to an orthogonal 
basis and replace the original Hamiltonian H with H = Z^HZ, where S~^ = ZZ^ 
is either the Lowdin {Z = 5*^^/^, see [85]) or the inverse Cholesky (Z ~ L^^ , with 
S — LL?^) factor of the overlap matrix S\ see, e.g., [24]. Here Z'^ denotes the transpose 
of Z] for the Lowdin factorization, Z is symmetric {Z = Z'^). Up to truncation, the 
transformed matrix H is still a banded (or sparse) matrix, albeit not as sparse as H. 
Hence, in our decay results we can replace H with H. The entries in S~^, and therefore 
those in Z, decay at a rate which depends on the conditioning of S. This, in turns, 
will depend on the particular basis set used, on the total number of basis functions, 
and on the inter-atomic distances, with larger separations leading to faster decay. 
This is further discussed in section [9] below. We note that the case of tight-binding 
Hamiltonians is covered by our theory. Indeed, the tight-binding method consists in 
expanding the states of the physical system (e.g., a crystal) in linear combinations of 
atomic orbitals of the composing atoms; such an approximation is successful if the 
atomic orbitals have little overlap, which translates to a sparse Hamiltonian. The 
same applies to 'real space' finite difference (or finite element) approximations [116] . 

For a given sparse discrete Hamiltonian H in an orthonormal basis, we consider 
the problem of approximating the zero-temperature density matrix associated with 
H, that is, the spectral projector P onto the occupied subspace spanned by the 
eigenvectors corresponding to the smallest Ue eigenvalues of H: 

P ^ ll^l (S) Ipl + ■ ■ ■ + ll^n, (S) Ipn, = |V'l)(V'l| + --- + IV'n,)('(/'nJ, 

where Hipi = eiijji for i = 1, . . . ,ne. Clearly, P is Hermitian and idempotent: P = 



■^A square matrix A = {Aij) is said to be m- banded if Aij = whenever |J — j| > m; for instance, 
a tridiagonal matrix is 1-banded according to this definition. 
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P* — p'^. Consider now the Heaviside (step) function 

{1 a X < II 
^ ii X — fi 
if a; > ^ 

where the number /i (sometimes called the Fermi level or chemical potential, see [S3]), 
is such that £„_, < /i < Sn^+i- If the spectral gap 7 = Sn^+i ~ ^WeJ ^-l^o known as the 
HOMO-LUMO gap|^ is not too small, the step function h is well approximated by 
the Fermi-Dirac functiorP|/F_D(a;) = 1/(1 + e'^^^^^^) for suitable values of /3 > 0: 

P = h{H) « fFoiH) = [In + eMPiH - fllnW^ ■ 
The smaller 7, the larger /3 must be taken in order to have a good approximation: see 



Fig. 8.8 The parameter /? can be interpreted as an (artificial) inverse temperature; 
the zero-temperature limit is quickly approached as /3 — ?■ cx). A major advantage of 
the Fermi-Dirac function is that it is analytic; hence, we can replace h with fpD and 
apply to it a wealth of results from approximation theory for analytic functions. 

We emphasize that the study of the zero-temperature limit - that is, the ground 
state of the system - is of fundamental importance in electronic structure theory. In 
the words of [H Chapter 2, pp. 11-12]: 

...the lowest energy ground state of the electrons determines the 
structure and low-energy motions of the nuclei. The vast array of 
forms of matter - from the hardest material known, diamond car- 
bon, to the soft lubricant, graphite carbon, to the many complex 
crystals and molecules formed by the elements of the periodic table 
- are largely manifestations of the ground state of the electrons. 

The Fermi-Dirac distribution is also used when dealing with systems at positive 
electronic temperatures (T > 0) with a small or null gap (e.g., metallic systems); in 
this case /3 = (fc^T)"^, where kg is Boltzmann's constant. In particular, use of the 
Fermi-Dirac function allows one to compute thermodynamical properties (such as the 
specific heat) and the T-dependence of quantities from first principles. In this case, 
of course, the matrix P = fpoiH) is no longer an orthogonal projector, not even 
approximately. 

We mention in passing that it is sometimes advantageous to impose the nor- 
malization condition Tr(P) = 1 on the density matrix; indeed, such a condition 
is standard and part of the definition of density matrix in the quantum mechanics 
literature, beginning with von Neumann |1311 I133J . At zero temperature we have 
Tr(P) = rank(P) — n^, and P is replaced by —P. With this normalization P is no 
longer idempotent, except when Ue = 1. In this paper we do not make use of such 
normalization. 

The localization ( "pseudo-sparsity" ) of the density matrix for insulators has been 
long known to physicists and chemists; see the literature review in the following 
section. A number of authors have exploited this property to develop a host of linear 



*HOMO = Highest Occupied Molecular Orbital; LUMO = Lowest Unoccupied Molecular Orbital. 

^Several other analytic approximations to the step function are known, some of which are prefer- 
able to the Fermi-Dirac function from the computational point of view; see, e.g., 1801 for a compar- 
ative study. For theoretical analysis, however, we find it convenient to work with the Fermi-Dirac 
function. 
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scaling algorithms for electronic structure computations; see, e.g., [4l [5l [20l [23l [24l 
[Sl[Sl[751[7nilHnilMlin71IMllMl im [Ml [133. In this paper we derive explicitly 
computable decay bounds which can be used, at least in principle, to determine a 
priori the bandwidth or sparsity pattern of the truncation of the density matrix 
corresponding to a prescribed error. As we shall see, however, our decay estimates 
tend to be conservative and may be pessimistic in practice. Hence, we regard our 
results primarily as a theoretical contribution, providing a rigorous (yet elementary) 
mathematical justification for some important localization phenomena observed by 
physicists. An important aspect of our work is that our bounds are universal, in the 
sense that they only depend on the bandwidth (or sparsity pattern) of the discrete 
Hamiltonian H, on the smallest and largest eigenvalues of H, on the gap 7 and, 
when relevant, on the temperature T. In particular, our results are valid for a wide 
range of basis sets and indeed for different discretizations and representations of the 
Hamiltonian. 

4. Related work. The localization properties of spectral projectors (more gen- 
erally, density matrices) associated with electronic structure computations in quantum 
chemistry and solid state physics have been the subject of a large number of papers. 
Roughly speaking, the results found in the literature fall into three broad categories: 

1. Fully rigorous mathematical results for model systems (some quite general); 

2. "Semi-rigorous" results for specific systems; these results are often charac- 
terized as "exact", or "analytical" by the authors (usually phsyicists), but 
would not be recognized as mathematically rigorous by mathematicians; 

3. Non-rigorous results based on a mixture of heuristics, physical reasoning, and 
numerics. 

Contributions in the first group are typically due to researchers working in solid 
state and mathematical physics. These include the pioneering works of Kohn [^ and 
des Cloizeaux [35], and the more recent papers by Nenciu [SS], Brouder et al. [H], 
and a group of papers by Prodan, Kohn, and collaborators |103l 11041 I105J . 

Before summarizing the content of these contributions, we should mention that 
nearly all the results found in the literature are expressed at the continuous level, 
that is, in terms of decay in functions rather than decay in matrices. The functions 
are typically functions of (real) space; results are often formulated in terms of the 
density kernel, but sometimes in terms of the Wannier functions. The latter form an 
orthonormal basis set associated with a broad class of Hamiltonians, and are widely 
used in solid state physics. Since the Wannier functions span the occupied subspace, 
localization results for the Wannier functions immediately imply similar localization 
results for the corresponding spectral projector. Note, however, that the spectral 
projector may be exponentially localized even when the Wannier functions are not. 

At the continuous level, the density matrix p ; M^ x M'^ — s- C is the kernel of the 
density operator V defined by 

(7'^)(r)- / p(r,r')^(r')dr', 

regarded as an integral operator on L^(M''); here d — 1,2,3. The vectors r and r' 
represent any two points in M'*, and |r — r'| is their (Euclidean) distance. The density 
kernel can be expressed as 

p(r,r') = V^.(r)V'.(r')% 
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where now ipi is the (normahzed) eigenfunction of the Hamihonian operator H corre- 
sponding to the ith lowest eigenvalue, i = 1, . . . , rig, and the asterisk denotes complex 
conjugation; see. e.g., [5S]. The density operator P admits the Dunford integral 
representation 

V^^ [{zl-Uy'dz, (4.1) 

where F is a simple closed contour in C surrounding the eigenvalues of H corresponding 
to the occupied states, with the remaining eigenvalues on the outside. 

In |74] , Kohn proved the rapid decay of the Wannier functions for one-dimensional, 
one-particle Schrodinger operators with periodic and symmetric potentials with non- 
intersecting energy bands. This type of Hamiltonian describes one-dimensional, cen- 
trosymmetric crystals. Kohn's main result takes the following form: 

lim w{x) e«"= = , (4.2) 

a;— foo 

where 'w{x) denotes a Wannier function (here x is the distance from the center of 
symmetry) and g is a suitable positive constant. In the same paper (page 820) Kohn 
also points out that for free electrons (not covered by his theory, which deals only 
with insulators) the decay is very slow, being like x~^. 



A few observations are in order: first, the decay result (4.2 ) is asymptotic, that is, 
it implies fast decay at sufficiently large distances \x\ only. Second, (4.2) is consistent 
not only with strict exponential decay, but also with decay of the form x^e"' ^ where p 
is arbitrary (positive or negative) and q' > q. Hence, the actual decay could be faster. 



but also slower, than exponential. Since the result in (4.2) provides only an estimate 
(rather than an upper bound) for the density matrix in real space, it is not easy to use 
in actual calculations. To be fair, such practical aspects were not discussed by Kohn 
until much later (see, e.g., [7S]). Also, later work showed that the asymptotic regime is 
achieved already for distances of the order of 1-2 lattice constants, and helped clarify 
the form of the power-law prefactor, as discussed below. 

The techniques used by Kohn, mostly the theory of analytic functions in one 
complex variable and some classical asymptotics for linear second-order differential 
operators with variable coefficients, did not lend themselves naturally to the treatment 
of higher dimensionl cases or more complicated potentials. The problem of the validity 
of Kohn's results in two and three dimensions has remained open for a very long 
time, and has been long regarded as one of the last outstanding problems of one- 
particle condensed-matter physics. Partial results were obtained by des Cloizeaux 
[5S] and much later by Nenciu [M^. Des Cloizeaux, who studied both the decay 
of the Wannier functions and that of the associated spectral projectors, extended 
Kohn's localization results to 3D insulators with a center of inversion (a specific 
symmetry requirement) in the special case of simple, isolated (i.e., nondegenerate) 
energy bands; he also treated the tight-binding limit for arbitrary crystals. Nenciu 
further generalized Kohn's results to arbitrary d-dimensional insulators, again limited 
to the case of simple bands. 

The next breakthrough came much more recently, when Brouder et al. [21 man- 
aged to prove localization of the Wannier functions for a broad class of insulators 
in arbitrary dimensions. The potentials considered by these authors are sufficiently 
general for the results to be directly applicable to DFT, both within the LDA and the 
GGA frameworks. The results in |21j . however, also prove that for Chern insulators 
(i.e., insulators for which the Chern invariants, which characterize the band structure. 
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are non-vanishing) the Wannier functions do not decay exponentiahy, therefore leav- 
ing open the question of proving the decay of the density matrix in this case |129j . It 
should be mentioned that the mathematics in TP is fairly sophisticated, and requires 
some knowledge of modern differential geometry and topology. 

Further papers of interest include the work by Prodan, Kohn, and collaborators 
[103( 11041 I105J . From the mathematical standpoint, the most satisfactory results 
are perhaps those presented in ^04,. In this paper, the authors use norm estimates 
for complex symmetric operators in Hilbert space to obtain sharp exponential decay 
estimates for the resolvents of rather general Hamiltonians with spectral gap. Using 



the contour integral representation formula (4.1 1, these estimates yield (for sufficiently 



large separations) exponential spatial decay bounds of the form 

|p(r,r')| <Ce-"l""-'''l (C> 0, a > 0, const.) (4.3) 

for a broad class of insulators. A lower bound on the decay rate a (also known as 
the decay length or inverse correlation length) is derived, and the behavior of a as a 
function of the spectral gap 7 is examined. 

Among the papers in the second group, we mention [5^ [Ml [701 [751 1501 [HTl [T^ . 
These papers provide quantitative decay estimates for the density matrix, either based 
on fairly rigorous analyses of special cases, or on not fully rigorous discussions of 
general situations. Large use is made of approximations, asymptotics, heuristics and 
physically motivated assumptions, and the results are often validated by numerical 
calculations. Also, it is occasionally stated that while the results were derived in the 
case of simplified models, the conclusions should be valid in general. Several of these 
authors emphasize the difficulty of obtaining rigorous results for general systems in 
arbitrary dimension. In spite of not being fully rigorous from a mathematical point of 
view, these contributions are certainly very valuable and seem to have been broadly 
accepted by physicists and chemists. We note, however, that the results in these 
papers usually take the form of order-of-magnitude estimates for the density matrix 
p{r, r') in real space, valid for sufficiently large separations |r — r'|, rather than strict 
upper bounds. As said before of Kohn's results, this type of estimates may be difficult 
to use for computational purposes. 

In the case of insulators, the asymptotic decay estimates in these papers take the 
form 

g-a|r-r'| 

p(r,r') = C-i — , |r-r'|^oo (a > , cr > 0, const.), (4.4) 

where higher order terms have been neglected. Many of these papers concern the 
precise form of the power-law factor (i.e., the value of a) in both insulators and 
metallic systems. The actual functional dependence of a on the gap and of a on the 
dimensionality of the problem have been the subject of intense discussion, with some 
authors claiming that a is proportional to 7, and others finding it to be proportional 



to 77; see, e.g., [Sl|701 El HOI IHTllTMl and section 8.6 below. It appears that both 
types of behavior can occur in practice. For instance, in 731 the authors provide a 
tight-binding model of an insulator for which the density falls off exponentially with 
decay length a = 0{'~f) in the diagonal direction of the lattice, and a = 0{y/j) in 
non-diagonal directions, as 7 — 7> 0+. We also note that in [73|, the decay behavior of 
the density matrix for an insulator is found to be given (up to higher order terms) by 

g-Q|r-r'| 
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where d is the dimensionahty of the problem. In practice, the power-law factor in the 
denominator is often ignored, since the exponential decay dominates. 

In ini], Goedecker argued that the density matrix for d-dimensional {d = 1, 2, 3) 
metallic systems at electronic temperature T > behaves to leading order like 

Note that in the zero-temperature limit, a power-law decay (with oscillations) is ob- 
served. An analogous result was also obtained in [70]. Note that the decay length in 
the exponential goes to zero like the temperature T rather than like vT, as claimed 



for instance in [3]. We will return on this topic in section 8.7 

Finally, as representatives of the third group of papers we select [3] and [140] . 
The authors of [3] use the Fermi-Dirac approximation of the density matrix and 
consider its expansion in the Chebyshev basis. From an estimate of the rate of decay 
of the coefficients of the Chebyshev expansion of fpD (x) , they obtain estimates for 
the number of terms needed to satisfy a prescribed error in the approximation of the 
density matrix. In turn, this yields estimates for the rate of decay as a function of the 
extreme eigenvalues and spectral gap of the discrete Hamiltonian. Because of some 
ad hoc assumptions and the many approximations used the arguments in this paper 
cannot be considered mathematically rigorous, and the estimates thus obtained are 
not always accurate. Nevertheless, the idea of using a polynomial approximation for 
the Fermi-Dirac function and the observation that exponential decay of the expansion 
coefficients implies exponential decay in the (approximate) density matrix is quite 
valuable and, as we show in this paper, can be made fully rigorous. 



Finally, in |140j the authors present the results of numerical calculations for vari- 
ous insulators in order to gain some insight on the dependence of the decay length on 
the gap. Their experiments confirm that the decay behavior of p(r, r') can be strongly 
anisotropic, and that different rates of decay may occur in different directions; this is 
consistent with the analytical results in |73) . 

Despite this considerable body of work, the localization question for density ma- 
trices cannot be regarded as completely settled from the mathematical standpoint. 
We are not aware of any completely general and rigorous mathematical treatment of 
the decay properties in density matrices associated with general (localized) Hamiltoni- 
ans, covering all systems with gap as well as metallic systems at positive temperature. 
Moreover, rather than order-of-magnitude estimates, actual upper bounds would be 
more satisfactory. 

Also, almost all the above-mentioned results concern the continuous, infinite- 
dimensional case. In practice, of course, calculations are performed on discrete, n- 
dimensional approximations H and P to the operators H and V. The replacement of 
density operators with finite density matrices can be obtained via the introduction of 
a system of n basis functions {<f'i}f=n leading to the density matrix P = (Pij) with 



Pu 



,V<jy,) ^ {<i>^\V\<i>,) ^ I / p(r,r')0.(r)>,(r')drdr'. (4.6) 



As long as the basis functions are localized in space, the decay behavior of the density 
function pir^r') for increasing spatial separation |r — r'| is reflected in the decay 
behavior of the matrix elements Pij away from the main diagonal (i.e., for \i — j\ 
increasing) or, more generally, for increasing distance d(i,j) in the graph associated 
with the discrete Hamiltonian; see section [6] for details. 
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In developing and analyzing 0{n) methods for electronic structure computations, 
it is important to rigorously establish decay bounds for the entries of the density 
matrices that take into account properties of the discrete Hamiltonians. It is in 
principle possible to obtain decay estimates for finite-dimensional approximations 
using localized basis functions from the spatial decay estimates for the density kernel. 



Note, however, that any estimates obtained inserting (4.3) or (4.51 into (4.61 would 
depend on the particular set of basis functions used. 

In this paper we take a different approach. Instead of starting with the contin- 
uous problem and discretizing it, we establish our estimates directly for sequences 
of matrices of finite, but increasing order. We believe that this approach is closer 
to the practice of electronic structure calculations, where matrices are the primary 
computational objects. 

We impose a minimal set of assumptions on our matrix sequences so as to re- 
produce the main features of problems encountered in actual electronic structure 
computations, while at the same time ensuring a high degree of generality. Since our 
aim is to provide a rigorous and general mathematical justification to the possibility 
of 0{n) methods, this approach seems to be quite naturalF] 

To put our work further into perspective, we quote from two prominent researchers 
in the field of electronic structure, one a mathematician, the other a physicist. In his 
excellent survey [77] Claude Le Bris, discussing the basis for linear scaling algorithms, 
i.e., the assumed sparsity of the density matrix, wrote (pages 402 and 404): 

The latter assumption is in some sense an a posteriori assumption, 
and not easy to analyse [...] It is to be emphasized that the numerical 
analyis of the linear scaling methods overviewed above that would 
account for cut-off rules and locality assumptions, is not yet available. 

It is interesting to compare these statements with two earlier ones by Stefan 
Goedecker. In |5T] he wrote (page 261): 

To obtain a linear scaling, the extended orbitals [i.e., the eigen- 
functions of the one-particle Hamiltonian corresponding to occupied 
states] have to be replaced by the density matrix, whose physical be- 
havior can be exploited to obtain a fast algorithm. This last point 
is essential. Mathematical and numerical analyses alone are not suf- 
ficient to construct a linear algorithm. They have to be combined 
with physical intuition. 

A similar statement can be found in [53^, page 1086: 

Even though 0{N) algorithms contain many aspects of mathematics 
and computer science they have, nevertheless, deep roots in physics. 
Linear scaling is not obtainable by purely mathematical tricks, but 
it is based on an understanding of the concept of locality in quantum 
mechanics. 

In the following we provide a general treatment of the question of decay in spectral 
projectors that is as a priori as possible, in the sense that it relies on a minimal 
set of assumptions on the discrete Hamiltonians; furthermore, our theory is purely 



°We refer the historically-minded reader to the interesting discussion given by John von Neumann 
in |132) on the benefits that can be expected from a study of the asymptotic properties of large 
matrices, in alternative to the study of the infinite-dimensional (Hilbert space) case. 
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mathematical, and therefore completely independent of any physical interpretation. 
Nevertheless, our theory allows us to shed light on questions like the dependence of 
the decay length on the temperature in the density matrix for metals at T > 0; see 



section 8.7 We do this using for the most part fairly simple mathematical tools from 
classical approximation theory and linear algebra. 

Of course, in the development of practical linear scaling algorithms a deep knowl- 
edge of the physics involved is extremely important; we think, however, that locality 
is as much a mathematical phenomenon as a physical one. 

We hope that the increased level of generality attained in this paper (relative 
to previous treatments in the physics literature) will also help in the development 
of 0{n) methods for other types of problems where spectral projectors and related 
matrix functions play a central role. A few examples are discussed in section [TT| 

5. Normalizations and scalings. We will be dealing with sequences of matri- 
ces {Hn} of increasing size. We assume that each matrix iJ„ is an Hermitian n x n 
matrix, where n = rib ■ He', here Ub is fixed, while He is increasing. As explained in sec- 
tion |3] the motivation for this assumption is that in most electronic structure codes, 
once a basis set has been selected the number Ub of basis functions per particle is fixed, 
and one is interested in the scaling as rif^, the number of particles, increases. Hence, 
the parameter that controls the system size is Ue- We also assume that the system is 
contained in a d-dimensional box of volume V = L'^ and that L —>■ oo as rie —)■ cxd in 
such a way that the average density rie/L'^ remains constant {therm,odynam,ic limit). 
This is very different from the case of finite element or finite difference approxima- 
tions to partial differential equations (PDEs), where the system (or domain) size is 
considered fixed while the number of basis functions increases or, equivalently, the 
mesh size h goes to zero. 

Our scaling assumption has very important consequences on the structural and 
spectral properties of the matrix sequence {H„}; namely, the following properties hold: 

1. The bandwidth of iJ„, which reflects the interaction range of the discrete 
Hamiltonians, remains bounded as the system size increases [SHI page 454]. 
More generally, the entries of _ff„ decay away from the main diagonal at a 
rate independent of Ue (hence, of n). See section ro^ for precise definitions and 
generalizations. 

2. The eigenvalue spectra a{H„) are also uniformly bounded as n^ — ?> cxd. In 
view of the previous property, this is equivalent to saying that the entries 
in Hn are uniformly bounded in magnitude: this is just a consequence of 
Gersgorin's Theorem (see, e.g., [67l page 344]). 

3. For the case of Hamiltonians modeling insulators or semiconductors, the spec- 
tral (HOMO-LUMO) gap does not vanish as rie -^ oo. More precisely: if e]^ 
denotes the ith eigenvalue of iJ„, and 7„ := £,"+1 ^ £«" : then inf„7„ > 0. 
This assumption does not hold for Hamiltonians modelling metallic systems; 
in this case, inf„ 7„ = 0, i.e., the spectral gap goes to zero as ne — >■ 00. 

We emphasize that these properties hold for very general classes of physical sys- 
tems and discretization methods for electronic structure, with few exceptions (i.e., 
non- localized basis functions, such as plane waves). It is instructive to contrast these 
properties with those of matrix sequences arising in finite element or finite difference 
approximations of PDEs, where the matrix size increases as /i — > 0, with h a dis- 
cretization parameter. Considering the case of a scalar, second-order elliptic PDE, 
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we see that the first property only holds in the one-dimensional case, or in higher- 
dimensional cases when the discretization is refined in only one dimension. (As we will 
see, this condition is rather restrictive and can be relaxed.) Furthermore, it is gener- 
ally impossible to satisfy the second assumption and the one on the non-vanishing gap 
(inf „ 7„ > 0) simultaneously. Indeed, normalizing the matrices so that their spectra 
remain uniformly bounded will generally cause the eigenvalues to completely fill the 
spectral interval as n — >■ cx). That is, in general, given any two points inside this 
interval, for n large enough at least one eigenvalue of the corresponding n x n matrix 
falls between these two points. 

Our assumptions allow us to refer to the spectral gap of the matrix sequence {Hn} 
without having to specify whether we are talking about an absolute or a relative gap. 
As we shall see, it is convenient to assume that all the matrices in the sequence 
{Hn} have spectrum contained in the interval [—1, 1]; therefore, the absolute gap and 
the relative gap of any matrix Hn are the same, up to the factor 2. The spectral 
gap (more precisely, its reciprocal) is a natural measure of the conditioning of the 
problem of computing the spectral projector onto the occupied subspace, i.e., the 
subspace spanned by the eigenvectors of iJ„ corresponding to eigenvalues el'' < fi; 
see, e.g., |109[ page B4] for a recent discussion. The assumption inf„7„ > then 
simply means that the electronic structure problem is uniformly well- conditioned; 
note that this assumption is also very important for the convergence of the outer SCF 
iteration [77 lll37j . This hypothesis is satisfied for insulators and semiconductors, but 
not in the case of metals. 

6. Approximation of matrices by numerical truncation. Discretization of 
H, the Hamiltonian operator, by means of basis sets consisting of linear combinations 
of Slater or Gaussian-type orbitals leads to matrix representations that are, strictly 
speaking, full. Indeed, since these basis functions are globally supported, almost 
all matrix elements Hij = {cjjjj'Hc/fi) = ((/)i|'H|(/)j) are non-zero. The same is true 
for the entries of the overlap matrix Sij = {(j)j,(j)i). However, owing to the rapid 
decay of the basis functions outside of a localized region, and due to the local nature 
of the interactions encoded by the Hamiltonian operator, the entries of H decay 
exponentially fast with the spatial separation of the basis functions. (For the overlap 
matrix corresponding to Gaussian-type orbitals, the decay is actually even faster than 
exponential.) 

More formally, we say that a sequence oi n x n matrices An = ([^„]ij) has 
the exponential off-diagonal decay property if there are constants c > and a > 
independent of n such that 

|[A„]ij| <ce""l*-j|, foraU i,j^l,...,n. (6.1) 

Corresponding to each matrix An we then define for a nonnegative integer m the 
matrix An = ( [An']ij J defined as follows: 



[Ai: 




if \i-j\ < to; 
otherwise. 



Clearly, each matrix An is TO,-banded and can be thought of as an approximation, 
or truncation, of An- Note that the set of ?7i-banded matrices forms a vector subspace 
Vm C C"^" and that An is just the orthogonal projection of A„ onto Vm with 
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-ln(|P(lj)l) 

- GxponGntial bound 



f^^^J''t^^'^^f/->tfi^f^^/^rv^J^\^.f^i>'-''^ . 



Fig. 6.1. Logarithmic plot of the first row of a density matrix and an exponential hound. 



respect to the Frobenius inner product {A,B)p := Tt{B*A). Hence, An is the best 
approximation of A„ in Vm with respect to the Frobenius norm. 

Note that we do not require the matrices to be Hermitian or symmetric here; we 
only assume (for simphcity) that the same pattern of non-zero off-diagonals is present 
on either side of the main diagonal. The following simple result from jH] provides an 
estimate of the rate at which the truncation error decreases as the bandwidth m of the 
approximation increases. In addition, it establishes n-independence of the truncation 



satisfying (6.1) and 



error for n — >■ oo for matrix sequences satisfying (6.1 1. 

Proposition 6.1. J7^ Let A he a matrix with entries Ai 
let ^(™) he the corresponding m-handed approximation. Then for any e > there is 
an fh such that \\A — A^"^'\\i < e for m > rh. 

The integer m in the foregoing proposition is easily found to be given by 

a VI -e-" 

Clearly, this result is of interest only for m < n (in fact, for rh ^ n). 

Example 6.2. Let us consider a tridiagonal matrix H of size 200 x 200, with 
eigenvalues randomly chosen in [—1, — 0.5]U[0.5, 1], and let P he the associated density 
matrix with /i = 0. Numerical computation shows that P satisfies the hound (6.1 ) with 
a = 0.6 and c = 10 (as long as its entries are larger than the machine precision). 
Fig. \6.1\ depicts the ahsolute value of the entries in the first row of P and the hound 
(6.1 ), in a logarithmic scale. Choose, for instance, a tolerance e = 10"^; then it follows 
from the previous formula that the truncated matrix p(™' satisfies \\P — P(™^||i < e 
for any bandwidth to > 29. 

What is important about this simple result is that when applied to a sequence 
{An} = ([A„]ij) of n X n matrices having the off-diagonal decay property (6.1 1 with c 



and a independent of n, the bandwidth m is itself independent of n. For convenience, 

A* 



we have stated Proposition 6.1 in the 1-norm; when A 
holds for the 2-norm, owing to the inequality 



the same conclusion 



\\Ah<VUhU\[ 



(6.2) 
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(see [57l Corollary 2.3.2]). Moreover, a similar result also applies to other types of 
decay, such as algebraic (power-law) decay of the form 

c 

\\An\ii\ < -, , , for all j, 7 = 1, . . . , n 

~ \i-.i\p + ^ 

with c and p independent of n, as long as p > 1. 

Remark 6.3. It is worth emphasizing that the above considerations do not re- 
quire that the matrix entries [An\ij themselves actually decay exponentially away from 
the main diagonal, but only that they are bounded above in an exponentially decaying 
manner. In particular, the decay behavior of the matrix entries need not be mono- 
tonic. 

Although we have limited ourselves to absolute approximation errors in various 
norms, it is easy to accommodate relative errors by normalizing the matrices. Indeed, 
upon normalization all the Hamiltonians satisfy ||-ff„||2 = 1; furthermore, for density 
matrices this property is automatically satisfied, since they are orthogonal projectors. 
In the next section we also consider using the Frobenius norm for projectors. 

The foregoing considerations can be extended to matrices with more general decay 
patterns, i.e., with exponential decay away from a subset of selected positions {i,j) 
in the matrix; see, e.g., [H] as well as [21]. In order to formalize this notion, we first 
recall the definition of geodetic distance d{i,j) in a graph |37j : it is the number of 
edges in the shortest path connecting two nodes i and j, possibly infinite if there is no 
such path. Next, given a (sparse) matrix sequence {An} we associate with each matrix 
An a graph G„ with n nodes and m — 0{n) edges. In order to obtain meaningful 
results, however, we need to impose some restrictions on the types of sparsity allowed. 
Recall that the degree of node i in a graph is just the number of neighbors of i, i.e., 
the number of nodes at distance 1 from i. We denote by deg„(i) the degree of node 
i in the graph Gn- We shall assume that the maximum degree of any node in G„ 
remains bounded as n ^ cxd; that is, there exists a positive integer D independent of 
n such that maxi<i<„ deg„(i) < D for all n. Note that when An = Hn (discretized 
Hamiltonian), this property is a mathematical restatement of the physical notion of 
locality, or finite range, of interactions. 

Now let us assume that we have a sequence of n x n matrices An — {[An]ij) 
with associated graphs Gn and graph distances dn{i,j)- We will say that An has the 
exponential decay property relative to the graph Gn if there are constants c > and 
a > independent of n such that 

|[A„]z,|<ce-"'*"(^^j), foraU i,j = l,...,n. (6.3) 

We have the following simple result. 

Proposition 6.4. Let {An} be a sequence ofnxn matrices satisfying the expo- 
nential decay property (KM relative to a sequence of graphs {G„} having uniformly 
bounded maximal degree. Then, for any given < e < c, each An contains at most 
0{n) entries greater than e in magnitude. 

Proof. For a fixed node i, the condition |[A„]ij| > e together with (6.3) immedi- 
ately implies 

d„(z,i)<-ln(-). (6.4) 

a Ve/ 



Since c and a are independent of n, inequality (6.4) together with the assumption that 



the graphs G„ have bounded maximal degree implies that for any row of the matrix 
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(indexed by i), there is at most a constant number of entries that have magnitude 
greater than e. Hence, only 0{n) entries in A„ can satisfy |[yl„]ij| > e. D 

Remark 6.5. Note that the hypothesis of uniformly bounded maximal degrees is 
certainly satisfied if the graphs G„ have uniformly bounded bandwidths (recall that the 
bandwidth of a graph is just the bandwidth of the corresponding adjacency matrix). 
This special case corresponds to the matrix sequence {An} having the off-diagonal 
exponential decay property. 

Under the same assumptions of Proposition |6.4[ we can show that it is possible to 
approximate each An to within an arbitrarily small error e > in norm with a sparse 
matrix An (i.e., a matrix containing only 0{n) non-zero entries). 

Proposition 6.6. Assume the hypotheses of Proposition \6^ are satisfied. Define 
the matrix An = ( [^n lii I , where 



Hj 



M(m)i.^l K]y if dn{i,j)<m; 
I otherwise. 



(m)| 



Then for any given e > 0, there exists rh independent ofn such that \\An — An ||i < e, 
for all m>fh. Moreover, if A = A* then it is also \\An — An \\2 < e for all m> ffi. 
Furthermore, each An contains only 0{n) non-zeros. 
Proof. For each n and m and for I < j < n, let 

K™(j) := {i I 1 < z < n and d„(i, j) > m} . 



We have 



I An - Ai") II 1 = max Y. I [^»]y I ^ c ,5^1^/ J2 e- 

l<7<n ^ — ^ l<7<n ^ — ^ 



ad„{i,j) 



Letting A := e ", we obtain 



A" 



||yl„-Alr^||i <c max V A'^-^''^') < c V A'^' < c V A*^' = . 

l<j<n ^—i -^-^ ^—1 1 — A 

^ ^ ieK'^iJ) k=m+l fc=m+l 

Since < A < 1, for any given e > we can always find ffi such that 

c < e for all m > fh. 

1 — A 

liAn = A*n, then ||A„- Ai'"^||2 < ||A„- Ai"^||i < e for all m > fh. The last assertion 
follows from the bounded maximal degree assumption. D 

Hence, when forming the overlap matrices and discrete Hamiltonians, only matrix 
elements corresponding to 'sufficiently nearby' basis functions (i.e., basis functions 
having sufficient overlap) need to be computed, the others being negligibly small. 
The resulting matrices are therefore sparse, and indeed banded for ID problems, 
with a number of non-zeros that grows linearly in the matrix dimension. The actual 
bandwidth, or sparsity pattern, may depend on the choice and numbering (ordering) of 
basis functions and (for the discrete Hamiltonians) on the strength of the interactions, 
i.e., on the form of the potential function V in the Hamiltonian operator. 

It should be kept in mind that while the number of non-zeros in the Hamiltonians 
discretized using (say) Gaussian- type orbitals is 0{n), the actual number of non-zeros 
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per row can be quite high, indeed much higher than when finite differences or finite 
elements are used to discretize the same operators. It is not unusual to have hundreds 
or even thousands of non-zeros per row. On the other hand, the matrices are very 
often not huge in size. As aheady mentioned, the size n of the matrix is the total 
number of basis functions, which is a small or moderate multiple (between 2 and 
25, say) of the number ng of electrons. For example, if Ub ~ 10 and rig ~ 2000, 
the size of H will be n « 20, 000 and H could easily contain several millions of 
non-zeros. This should be compared with 'real space' discretizations based on finite 
elements or high-order finite difference schemes |116j . The resulting Hamiltonians 
are usually very sparse, with a number of non-zero entries per row averaging a few 
tens at most 0. However, these matrices are of much larger dimension than the 
matrices obtained using basis sets consisting of atom-centered orbitals. In this case, 
methodologies based on approximating the density matrix are currently not feasible, 
except for ID problems. The same remark applies to discretizations based on plane 
waves, which tend to produce matrices of an intermediate size between those obtained 
using localized basis sets and those resulting from the use of real space discretizations. 
These matrices are actually dense and are never formed explicitly. Instead, they are 
only used in the form of matrix- vector products, which can be implemented efficiently 
by means of FFTs; see, e.g., |116) . 

The possibility of developing linear scaling methods for electronic structure largely 
depends on the localization properties of the density matrix P. It is therefore critical 
to understand the decay behavior of the density matrix. Since at zero temperature the 
density matrix is just a particular spectral projector, we consider next some general 
properties of such projectors. 

7. General properties of orthogonal projectors. While our main goal in 
this paper is to study decay properties in orthogonal projectors associated with certain 
sequences of sparse matrices of increasing size, it is useful to first establish some a 
priori estimates for the entries of general projectors. Indeed, the intrinsic properties 
of a projector like idempotency, positive semidefiniteness, and the relations between 
their trace, rank, and Frobenius norm tend to impose rather severe constraints on the 
magnitude of its entries, particularly for increasing dimension and rank. 

We begin by observing that in an orthogonal projector P, all entries Pij satisfy 
\Pij\ ^ 1 and since P is positive semidefinite, its largest entry is on the main diagonal. 
Also, the trace and rank coincide: Tr(P) = rank(P). Moreover, ||P||2 = 1 and 

iip|i^ = y^(p). 

In the context of electronic structure computations, we deal with a sequence of 
n X n orthogonal projectors {Pn} of rank rig, where n ~ Ub ■ n^ with n^ increasing 
and rib fixed. Hence, 

Tr(P„) = rank(P„) = ng, and ||P„||f = V^- (7.1) 

For convenience, we will call a sequence of orthogonal projectors {Pn} satisfying ( |7.1[ ) 
a density matrix sequence; the entries of P„ will be denoted by [Pn]ij- We have the 
following lemma. 

Lemma 7.1. Let {Pn} be a density matrix sequence. Then 

l-iii^j \Wn\ij\ 1 

ll^nlll ~ rib 
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Proof. Just observe that Tr(P„) = X]"=i[^n]" — ^e together with |[P„]m| < 1 
for all i imply that the minimum of the sum X]i=i |[^n]ii| is achieved when [P„]ii = 
z^ = J_ for all z. Hence, ELi l[^n]^.|' > f - ^- Therefore, 

^ |[P„],,f = \\PJ% - J2\^Pnhf < f 1 - -) ^e (7.2) 



1^2 4 = 1 

and the result follows dividing through by ||P„|||, = rig. D 

Remark 7.2. From the proof one can trivially see that the bound {7.2) is sharp. 
In section 10 we shall see a non-trivial example where the bound is attained. 

Theorem 7.3. Let {Pn} be a density matrix sequence. Then, for any e > 0, the 
number of entries of Pn greater than or equal to e in magnitude grows at most linearly 
with n. 

Proof. Clearly, it suffices to show that the number of off-diagonal entries [Pn]ij 
with |[Pn]ij| ^ e can grow at most linearly with n. Let 

I^{ii,j)\l<i,3<nandi^j} and I, = {(z, j) G 1 1 |[P„],,| > e} . 

Then obviously 

and if \Ie\ = K, then 

^ — ^ 1 1 H. 1 1 f, n ^ n 

Hence, by Lemma |7.1[ 



P P ~ n 



|2 






n\\p 

from which we obtain the bound 



A^ < ^ (l - -) , (7.3) 

which shows that the number K of entries of P„ with |[Pn]ij| > e can grow at most 
as 0(ri) for n — > oo. D 

Remark 7.4. Due to the presence of the factor e^ in the denominator of the 
bound {7.3), for small e the proportion of entries of Pn that are not smaller than 
e can actually be quite large unless n is huge. Nevertheless, the result is interest- 
ing because it shows that in any density matrix sequence, the proportion of entries 
larger than a prescribed threshold must vanish as n ^ oo. In practice, for density 
matrices corresponding to .sparse Hamiltonians with gap, localization occurs already 
for moderate values of n. 

We already pointed out in the previous section that if the entries in a matrix 
sequence {A„} decay at least algebraically with exponent p > 1 away from the main 
diagonal, with rates independent of n, then for any prescribed e > it is possible 
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to find a sequence of approxiniants < An ( witli a fixed bandwidtli m (or sparsity 

pattern) sucli tliat ||A„ — j4„ || < e. Tliis applies in particular to density matrix 
sequences. The next result shows that in principle, a linear rate of decay is enough 
to allow for banded (or sparse) approximation to within any prescribed relative error 
in the Frobenius norm. 

Theorem 7.5. Let {P„} be a density matrix sequence and assume that there 
exists c > independent of n such that \[Pn\ij\ "£ c/{\i — j\ + 1) for alli,j = 1, . . . , n. 
Then, for all e > 0, there exists a positive integer fh independent of n, such that 

WPn 



-fri \\f 



Pn 



< e for all m > m. 



where Pn is the m-handed approximation obtained by setting to zero all the entries 
of Pn outside the band. 

Proof. We subtract Pn from P„ and compute ||P„ — P„ |||. by adding the 

squares of the non-zeros entries in the upper triangular part of Pn — Pn diagonal by 
diagonal and multiplying the result by 2 (since the matrices are Hermitian). Using 
the decay assumption we obtain 



IP - P*-' 



(™)||2^<2c2 



E 



= 2c" 



E 



k 



^^ [k-{n + lW 



fe=l \ ' I fe. 

To obtain an upper bound for the right-hand side, we observe that the function 

X 



/(^) 



{x - a) 



2 ' 



a = n + 1, 



is strictly increasing and convex on the interval [l^n — m]. Hence, the sum can be 
bounded above by the integral of f{x) taken over the same interval: 



E 



< 



■ dx, a = n + 1. 



^^ {n-k + lf J, {x-aY 

Evaluating the integral and substituting a = n + 1 in the result we obtain 

1 



\\Pn-Pt^\\l<2c 



In ( + (n + 1) —— - - 

n \ ?Ti + 1 n 



Dividing by ||Pri|||^ = ri^. wc find 



I p _ p(™)||2 
I -T n -T n 1 1 p 

IIP l|2 



< 



2c2 



In ( 1 + (n + 1) 



1 



m + 1 



Recalling that n = ui, ■ n^, we can rewrite the last inequality as 






(™)||2 



< 



2c2 n + 1 
rie m + 1 



P l|2 



< 



2c^ n + 1 
m + 1 ne 



2c' 

771 + 1 V Ur- 



2c^ / 
m + 1 



a quantity which can be made arbitrarily small by taking m sufficiently large. D 

Remark 7.6. In practice, linear decay (or even algebraic decay with a small ex- 
ponent p > 1) is too .slow to be useful in the development of practical 0{n) algorithms. 
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For example, from the above estimates we obtain fh ~ 0{e^^) which is clearly not a 
very encouraging result, even allowing for the fact that the above bound may be pes- 
simistic in general. To date, practical linear scaling algorithms have been developed 
only for density matrix sequences exhibiting exponential off-diagonal decay. 

In the case of exponential decay, one can prove the foUowing result. 

Theorem 7.7. Let {Pn} be a density matrix sequence with \[Pn]ij\ < ce^"'*^-'!, 

where c > and a > are independent of n. Let < Pn \ be the corresponding 

sequence of m-banded approximations. Then there exists fco > independent of n and 
m such that 



\Pn-plr\ 



F ^ 1„ „-2am 



\Pn\\l 



< fco e 



Proof. Similar to that of Theorem |7.5[ except that it is now easy to evaluate the 
upper bound and the constants exactly. We omit the details. D 

Remark 7.8. It is immediate to see that the foregoing bound implies the much 
more favorable estimate fh = 0(lne~^). 

Again, similar results holds for arbitrary sparsity patterns, replacing \i — j\ with 
the graph distance. More precisely, the following result holds. 

Theorem 7.9. Let {Pn} be a density matrix sequence with the exponential decay 
property with respect to a sequence of graphs {G„} having uniformly bounded maximal 
degree. Then, for all e > 0, there exists a positive integer fh independent of n .such 
that 

< e tor all m> m. 



where Pn is sparse, i.e., it contains only 0{n) non-zeros. 

We consider now some of the consequences of approximating full, but localized 
matrices with sparse ones. The following quantity plays an important role in many 
electronic structure codes: 

{E) = TiiPH) = ei + £2 + • • • + £«,, 

where e^ denotes the ith eigenvalue of the discrete Hamiltonian H. Minimization 
of Tr(PiJ), subject to the constraints P = P* = P'^ and Tr(P) = rig, is the basis 
of several linear scahng algorithms; see, e.g., [Ml IMIIZZI HE 113 ISSl 1^ • Note that 
in the tight-binding model, and also within the independent electron approximation, 
the quantity {E) represents the single-particle energy [6l [53 l [97 l 1128) . Now, assume 
that H K, H and P k, P, and define the corresponding approximation of {E) as 
{E) = Tt{PH). (We note in passing that in order to compute (E) = Tt{PH), only 
the entries of P corresponding to non-zero entries in H need to be computed.) Let 
Ap = P- P mndAn = H - H. We have 

(E) = Tr[(P + Ap){H + Ah)] = Tr(PiJ) + Tr(PAH) + Tr(Api?) + Tr(ApAjf). 

Neglecting the last term, we obtain for 6e = \ {E) — (E) \ the bound 

6E<\Tr{PAH)\ + \Tr{ApH)\. 
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Recalling that the Frobenius norm is the matrix norm induced by the inner product 

{A,B) — Tt{B*A), using the Cauchy-Schwarz inequality and \\P\\f = ^/^^^ we find 

Se < V^WAhWf + \\Ap\\f\\H\\f. 

Now, since the orthogonal projector P is invariant with respect to scalings of the 
Hamiltonian, we can assume ||i?||F = 1, so that 5e < y^ II^hII-F + || Ap||i? holds. In 
practice, a bound on the relative error would be more meaningful. Unfortunately, it is 
not easy to obtain a rigorous bound in terms of the relative error in the approximate 
projector P. If, however, we replace the relative error in (E) with the normalized error 
obtained by dividing the absolute error by the number n,, of electrons, we obtain 

Se ^ \\Ah\\f ^ IIApIIf 
rip ~ ^/r^7 n« 



A similar bound for dEJn^ that involves matrix 2- norms can be obtained as follows. 
Recall that n — n\,- rig, and that |j^|j_F < %A^||^||2 for any n x n matrix A. Observing 
that the von Neumann trace inequality |68i pages 182-183] implies |Tr(PA//)| < 
Tr(P)|iAH||2 = ne||Affll2, we obtain 

— <||Ah||2 + ./^||Ap||2. (7.4) 

rie V Tip, 



Since rib is constant, an interesting consequence of (7.4) is that for large system sizes 
(i.e., in the limit as Ue — > oo), the normalized error in (E) is essentially determined 
by the truncation error in the Hamiltonian H rather than by the error in the density 
matrix P. 

On the other hand, scaling H so that ||i?||p = 1 may not be advisable in practice. 
Indeed, since the Frobenius norm of the Hamiltonian grows unboundedly for n^ — ?> cxi, 
rescaling H so that ||ii^||F — 1 would lead to a loss of significant information when 
truncation is applied in the case of large systems. A more sensible scaling, which is 
often used in algorithms for electronic structure computations, is to divide ||-ff|| by 
its largest eigenvalue in magnitude, so that ||iJ||2 = 1- This is consistent with the 
assumption, usually satisfied in practice, that the spectra of the Hamiltonians remain 
bounded as rig — >■ oo. (Note this is the same normalization used to establish the decay 
bounds in section Isj) With this scaling we readily obtain, to first order, the bound 

— <||Aff||2 + nfa||Ap||2, (7.5) 

He 

showing that errors in Ah and Ap enter the estimate for the normalized error in 
the objective function Tt{PH) with approximately the same weight, since rib is a 
moderate constant. We also note that since both error matrices Ah and Ap are 



Hermitian, (6.2 1 implies that the bounds (7.4) and (7.5) remain true if the 2- norm 



is replaced by the 1-norm. We mention that the problem of the choice of norm in 
the measurement of truncation errors has been discussed in |llll I114| . These authors 
emphasize the use of the 2-norm, which is related to the distance between the exact 
and inexact (perturbed) occupied subspaces X :— Range (P) and X :— Range (P) as 
measured by the sine of the principal angle between X and X; see |lll] . 

One important practical aspect, which we do not address here, is that in many 
quantum chemistry codes the matrices have a natural block structure (where each 
block corresponds, for instance, to the basis functions centered at a given atom); 
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hence, dropping is usually applied to submatrices rather than to individual entries. 
Exploitation of the block structure is also desirable in order to achieve high perfor- 
mance in matrix-matrix products and other operations, see, e.g., [M[[^[112j . 

We conclude this section with a few remarks on the infinite-dimensional case. 
Recall that any separable, complex Hilbert space M' is isometrically isomorphic to 
the sequence space 



e 



ex: 

{(Cn) I ^„ e C Vn G N and ^ |e„P < 00} . 



Moreover, if {e„} is an orthonormal basis in J^, to any bounded linear operator A 
on M" there corresponds the infinite matrix A = [Aij] acting on ^^, uniquely defined 
by Aij = {ej.Aei). Note that each column of A must be in £^, hence the entries A^ 
in each column of A must go to zero for i ^- 00. The same is true for the entries in 
each row (for j -^ 00) since A* = {A*^), the adjoint of A, is also a (bounded) operator 
defined everywhere on £'^. More precisely, for any bounded linear operator A — {Aij) 
on l'^ the following bounds hold: 

00 00 

Y^ l^yf < \\A\\l for ah i and ^ |Ayf < ||A||^ for all j , (7.6) 

1=1 i=l 

since ||A||2 = ||A*||2. 

An orthogonal projector P on J^ is a self-adjoint {V ~ V*), idempotent {P = V^) 
linear operator. Such an operator is necessarily bounded, with norm ||7^|| = 1. Hence, 
(7.6) implies 

CO 

where P — {Pij) denotes the matrix representation of P. The idempotency condition 
implies 

00 

P'J = Y. P'^^P^] ' ^°^ ^11 ^, j = 1, 2, . . . 

fe=i 

In particular, for i — j we get, using the hermiticity property Pij — P*f 

00 00 

P« = EP»fe-PH = 5Il^«fel'' ^°^'^11 1 = 1,2,... (7.8) 

k=l k=l 



Now, since P is a projector its entries satisfy \Pij\ < 1, therefore (7.8) is a strength- 
ening of inequality (7.7). Note in particular that the off-diagonal entries in the first 
row (or column) of P must satisfy 

those in the second row (or column) must satisfy 

J>2 
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and in general the entries Ptj with j > i must satisfy 



Ei^. 



zji' < 1 - E \Pk^\^ f°^ all i = 1, 2, . . . (7.9) 

j>i k=l 

Hence, decay in the off-diagonal entries in the ith row of P must be fast enough for the 



bounds (7.9) to hold. In general, however, it is not easy to quantify the asymptotic 



rate of decay to zero of the off-diagonal entries in an arbitrary orthogonal projector 



on £ . In general, the rate of decay can be rather slow. In section 10 we will see an 
example of spectral projector associated with a very simple tridiagonal Hamiltonian 
for which the off-diagonal entries decay linearly to zero. 

8. Decay results. In this section we present and discuss some results on the 
decay of entries for the Fermi-Dirac function applied to Hamiltonians and for the 
density matrix (spectral projector corresponding to occupied states). We consider 
both the banded case and the case of more general sparsity patterns. The proofs, 
which require some basic tools from polynomial approximation theory, will be given 
in subsection 18^ 



8.1. Bounds for the Fermi— Dirac function. We begin with the following 
result for the banded case. As usual in this paper, in the following one should think 
of the positive integer n as being of the form n — nh-rie with nt constant and 7ie -^ cxi. 

Theorem 8.1. Let m be a fixed positive integer and consider a sequence of 
matrices {H„} such that: 

(i) Hn is an n X n Hermitian, m-banded matrix for all n; 

(ii) For every n, all the eigenvalues of Hn lie in the interval [—1,1]. 
For a given Fermi level fi and inverse temperature (3, define for each n the n x n 
Hermitian matrix Fn '■= fpoiHri) — [In + e'^'^""'*^"'] . Then there exist constants 
c > and a > 0, independent of n, such that the following decay bound holds: 

\[Fnh\<ce-"\'-^\, i^j. (8.1) 

The constants c and a can be chosen as 

c - ^^^^, Mix) = max|/pz.(z)|, (8.2) 

a=-lnx, (8.3) 

m 

for any 1 < x < X; where 



^/Vif^Hi- 


^,^)- 


7r2)2 + 47r2/32- 


-|3^^{l-^J?)+^:^ 


+ 






V2/3 




^vmi 


-M^) 


-7r2)2+47r2/32 


+ /32(l + ;i2)+, 


t2 



\/2/? 



(8.4) 



and £^ is the unique ellipse with foci m — 1 and 1 with semi-axes ki > 1 and K2 > 0, 
and X — ^1 + ^2- 

Remark 8.2. The ellipse £^ in the previous theorem is unique because the identity 
\/ k\ — Kj = 1, valid for any ellipse with foci in 1 and —1, implies ki — K2 = l/('*i + 
K2), hence the parameter X — '*i + ^^2 alone completely characterizes the ellipse. 
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I0r 




- X = 1 .362346 



Fig. 8.1. Bounds ||8.1| with ^ = and /3 = 10, for three different values of x- 



Remark 8.3. Theorem \8.1\ can he immediately generalized to the case where the 
spectra of the sequence {Hn\ are contained in an interval [a, h], for any a < 6 e M. It 
suffices to shift and scale each Hamiltonian: 



Hn 



-Hn 



so that Hn has spectrum in [—1,1]. For the decay hounds to he independent of n, 
however, a and b must be independent of n. 

It is important to note that there is a certain amount of arbitrariness in the choice 
of Xj and therefore of c and a. If one is mainly interested in a fast asymptotic decay 
behavior (i.e., for sufhciently large |« — j|), it is desirable to choose x ^^ large as 
possible. On the other hand, if x is very close to x then the constant c is likely to 
be quite large and the bounds might be too pessimistic. Let us look at an example. 
Take /i = 0; in this case we have 



X = (tt + v//32 + 7r2) //3 and M{x) - | V (l + e^'^) | 



where C = i 



. X 



1 



2X 



Note that, in agreement with experience, decay is faster for smaller j3 (i.e., higher 
electronic temperatures); see sections 8.3 and 8.7 for additional details and discussion. 
Figures 8.1 and 8.2 show the behavior of the bound given by 

1) for P 



.1 ) on the first row 
10 and for three values of x- It 



of a 200 X 200 tridiagonal matrix [m 

is easy to see from the plots that the asymptotic behavior of the bounds improves as 

X increases; however, the bound given by x = 1.362346 is less useful than the bound 
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Fig. 8.2. Logarithmic plot of the bounds (|8.1|l with /j = and /3 = 10 
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Fig. 8.3. Plot of c as a function of x with fi = and (3 = 10. 



given by x — 1-3- Figure |873| is a plot of c as a function of x ^-nd it shows that c grows 
very large when x is close to x- This is expected, since fpoiz) has two poles, given 
by z = ±i 7r//3 on the regularity ellipse S^. It is clear from Figures 



8.1 



and 



8.2 



that 



X = 1-3 is the best choice among the three proposed values, if one is interested in 
determining a bandwidth outside of which the entries of Fn can be safely neglected. 
As already observed in [Mj I106J . improved bounds can be obtained by adaptively 
choosing different (typically increasing) values of x as |i — j| grows, and by using as 
a bound the (lower) envelope of the curves plotted in Figure |8.4[ which shows the 
behavior of the decay bounds for several values of x € (1-1: x)j with x ~ 1.3623463. 



The results of Theorem |8.1| can be generalized to the case of Hamiltonians with 
rather general sparsity patterns; see [TH [211 1106| . To this end, we make use of the 
notion of geodetic distance in a graph already used in section [61 The following result 
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Fig. 8.4. Logarithmic plot of the bounds KAt with /i = and /3 = 10, for several values of x- 



holds. 

Theorem 8.4. Consider a sequence of matriees {H„} such that: 
(i) Hn is an n X n Hermitian matrix for all n; 

(ii) the spectra a{Hn) are uniformly bounded and contained in [—1, 1] for all n. 
Let dn{i,j) be the graph distance associated with Hn- Then the following decay bound 
holds: 



\[Fn] 



< ce 



i„(*j) 



iy^j, 



(8.5) 



where 9 — lux and the remaining notation and choice of constants are as in Theorem 

EH 



We remark that in order for the bound (8.5) to be meaningful from the point of 



view of linear scaling, we need to impose some restrictions on the asymptotic sparsity 
of the graph sequence {Gn}- As discussed in section [6] 0{n) approximations of Fn 
are possible if the graphs Gn have maximum degree uniformly bounded with respect 
to n. This guarantees that the distance dn{i,j) grows unboundedly as |j — j| does, at 
a rate independent of n for n — > oo. 

8.2. Density matrix decay for systems with gap. The previous results es- 
tablish exponential decay bounds for the Fermi-Dirac function of general localized 
Hamiltonians and thus for density matrices of arbitrary systems at positive electronic 
temperature. In this subsection we consider the case of gapped systems (like insula- 
tors) at zero temperature. In this case, as we know, the density matrix is the spectral 
projector onto the occupied subspace. As an example, we consider the density ma- 
trix corresponding to the linear alkane n-Dopentacontane C52II106 composed of 52 
Carbon and 106 Hydrogen atoms, discretized in a Gaussian-type orbital basis. The 
number of occupied states is 209, or half the total number of electrons in the systemjj 
The cor responding Hamiltonian in the original non-orthogonal basis i s di splayed in 
Fig. 9.1 (top) and the 'orthogonalized' Hamiltonian H is shown in Fig. 9.1 (bottom). 
Fig. |8.5| displays the zero temperature density matrix, which is seen to decay expo- 
nentially away from the main diagonal. Comparing Fig. |8.5| and Fig. |9.H we can 



see that for a truncation level of 10 , the bandwidth of the density matrix is only 



^Here spin is being taken into account, so that the density kernel is given by p(r, r') 
2ES(''/'i(r)V»{r')*; see, e.g., [H page 10]. 
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Fig. 8.5. Magnitude of the entries in the density matrix for the linear alkane 052.^106 chain, 
with 209 occupied states. White: < lO"*; yellow: 10"* - IQ-"^; green: W'^ - lO"' '; blue: IQ-* - 
10~^ ; black: > 10~^. Note: nz refers to the number of 'black' entries. 



slightly larger than that of the Hamiltonian. The eigenvalue spectrum of the Hamil- 
tonian, scaled and shifted so that its spectrum is contained in the interval [—1, 1], is 
shown in Fig. 8.6 One can clearly see a large gap (sa 1.4) between the 52 low- lying 



eigenvalues corresponding to the core electrons in the system, as well as the smaller 
HOMO-LUMO gap (« 0.1) separating the 209 occupied states from the virtual (un- 
occupied) ones. It is worth emphasizing that the exponential decay of the density 
matrix is independent of the size of the system; that is, if the alkane chain was made 
arbitrarily long by adding C and H atoms to it, the density matrix would be of course 
much larger in size but the bandwidth would remain virtually unchanged for the same 
truncation level, due to the fact that the bandwidth and the HOMO-LUMO gap of 
the Hamiltonian do not appreciably change as the number of particles increases. It is 
precisely this independence of the rate of decay (hence, of the bandwidth) on system 
size that makes 0{n) approximations possible (and competitive) for large n. 



Let us now see how Theorem 8.1 can be used to prove decay bounds on the entries 
of density matrices. Let H be the discrete Hamiltonian associated with a certain 
physical system and let /i be the Fermi level of interest for this system. We assume 
that the spectrum of H has a gap 7 around /i, that is, we have 7 = e+ — e^ > 0, where 
£+ is the smallest eigenvalue of H to the right of /i and e~ is the largest eigenvalue of 
H to the left of /i. In the particular case of the HOMO-LUMO gap, we have e~ = £„^ 
and £"•" = e„^+i. 

The Fermi-Dirac function can be used to approximate the Heaviside function; 
the larger is /3, the better the approximation. More precisely, the following result is 
easy to prove (see |106j ): 
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Fig. 8.6. Spectrum of the Hamiltonian for Cz2H\(j%. 



Proposition 8.5. Let S > be given. If (3 is such that 



2 /1-6 



(8.6) 



then 1 — /fd(£ ) < S and fpoie:^) < S. 



In Fig. 8.7 we show Fermi-Dirac approximations to the Heaviside function (with 
0) for different values of 7 between 0.1 and 1, where /3 has been chosen 

^. The behavior 



a jump at /i 

so as to reduce the error in Proposition 



of /3 as a function of 7 according to (8.6 



between 0.1 and 1, where 
above the value S 



10- 



is plotted in Fig. 8.8 



As a consequence of Theorem |8.1| and Proposition |8.5| we have: 

Corollary 8.6. Let nj, be a fixed positive integer and n = n^, ■ n^,, where the 
integers n^ form a monotonically increasing sequence. Let {-ffn} be a sequence of 
Hermitian n x n matrices with the following properties: 

1. Each Hn has bandwidth m independent of n; 

2. There exist two fixed intervals /i = [— l,a],/2 = [b, 1] C M with ^ ^ b — a > 
0, such that for all n = Ub ■ rie, /i contains the smallest rie eigenvalues of 
Hn (counted with their multiplicities) and I2 contains the remaining n — n^, 
eigenvalues. 

Let Pn denote the nxn spectral projector onto the subspace spanned by the eigenvectors 
associated with the n^ smallest eigenvalues of Hn, for each n. Let 6 > be arbitrary. 
Then there exist constants c > 0, a > independent of n such that 



i]ij\ < mini 



\[Pn]ij\ < min- 



l,ce" 



'a\i-j\ 



for all i ^ j. 



•7) 
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Fig. 8.7. Approximations of Heaviside function by Fermi-Dirac function (fi = 0) for different 
values of 'y and 5 = 10~®. 
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Fig. 8.8. Behavior of the minimum acceptable value of (3 as a function of-y, for different values 



The constants c and a can be computed from (8.2) and (8.3), where x is chosen in 



the interval (l,x), with x given by (8.4| and (3 such that (8.6) holds 



Corollary |8.6| allows us to determine a priori a bandwidth m independent of n 
outside of which the entries of P„ are smaller than a prescribed tolerance r > 0. 
Observe that it is not possible to incorporate S in the exponential bound, but, at least 
in principle, one may always choose S smaller than a certain threshold. For instance. 
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one may take 6 < t/2 and define m as the smallest integer value of m such that the 
relation ce""™ < t/2 holds. 

In the case of Hamiltonians with a general sparsity pattern one may apply The- 
orem |8.4| to obtain a more general version of Corollary |8.6[ If the fixed bandwidth 
hypothesis is removed, the following bound holds: 

\[Pn]rj\ < min |l, ce^^''"(''^^| + S, for all i ^ j, (8.8) 

with 6 = In^. Once again, for the result to be meaningful some restriction on the 
sparsity patterns, like the uniformly bounded maximum degree assumption already 
discussed, must be imposed. 



8.3. Proof of decay bounds. Theorem |8.1| is a consequence of results proved 
in [12 (Thm. 2.2) and |106| (Thm. 2.2); its proof relies on a fundamental result 
in polynomial approximation theory known as Bernstein's Theorem |92j . Given a 
function / continuous on [—1, 1] and a positive integer k, the fcth best approximation 
error for / is the quantity 



Ekif)=mii max \ f (x) - pix)\ : p £ Pk 

I — l<a:<l 

where Pk is the set of all polynomials with real coefficients and degree less than 
or equal to k. Bernstein's theorem describes the asymptotic behavior of the best 
approximation error for a function / analytic on a domain containing the interval 

[-1,1]- 

Consider the family of ellipses in the complex plane with foci in —1 and 1. As 

already mentioned, an ellipse in this family is completely determined by the sum x > 1 

of its half-axes and will be denoted as E^ ■ 

Theorem 8.7. [Bernstein] Let the function f be analytic in the interior of the 
ellipse £^ and continuous on £^. Moreover, assume that f{z) is real for real z. Then 

EM) < -4^. 

where M{x) = max^e^^ 1/(^)1 • 

Let us now consider the special case where f{z) :— fpoiz) = 1/(1 + c'^^^^''^) is 
the Fermi-Dirac function of parameters /3 and /i. Observe that fpoiz) has poles in 
/J, ±15, so the admissible values for x with respect to fpu(z) are given by 1 < x < x, 
where the parameter x is such that ^ ± 15 S f x (^^^ regularity ellipse for / = fpo)- 
Also observe that smaller values of /? correspond to a greater distance between the 
poles of fpoiz) and the real axis, which in turn yields a larger value of x- Iii other 
words, the smaller /3, the faster the decay in Theorem |8.1| Explicit computation of x 



yields (8.4) 



Now, let Hn be as in Theorem |8.1| We have 

\\fFDiHn)-Pk{Hn)h= max \fFD{x)-pkix)\<Ek{fFD)<cq''+\ 

xe<y{H„) 

where c = 2xM{x)/{x ^ 1) ^^^d q ~ 1/x- The Bernstein approximation of degree k 
gives a bound on \[fFD{Hn)\ij\ when [pk{Hn)\ij = 0, that is, when \i — j| > mk. We 
may also assume \i — j\ < m{k + 1). Therefore, we have 

\[fFD{Hn)h\ < ce^C^+i)'"^'''^'") = ce-^^^'^+i) < ce-^l^-^l 
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As for Theorem 8.4 note that for a general sparsity pattern we have [{Hn) ]ij = 0, 
and therefore [pk{Hn)\ij = 0, whenever dn{i,j) > k. Writing dn{i,j) = fc+1 we obtain 



Let us now prove Corollary 8.6 Assume that (3 satisfies the inequality (8.61 
for given values of S and 7. If we approximate the Heaviside function with step 
at /i by means of the Fermi-Dirac function fpoix) = 1/(1 + e''^^"'^^), the pointwise 
approximation error is given by g{x) = e'''^~^V(l + 6^^^~'*'') for x < /it and by fpoix) 
for a; > /i. It is easily seen that g{x) is a monotonically increasing function, whereas 
fpD is monotonically decreasing. As a consequence, for each Hamiltonian iJ„ we have 
that 1 — /fdW < S for all eigenvalues X £ Ii and /p'd(A) < 6 for all A € /2. In other 
words, the pointwise approximation error on the spectrum of Hn is always bounded 
by S. Therefore, we have 

\[Pn - fFDiHn)U < \\Pn - fFD{Hn)h < S- 

We may then conclude using Theorem |8.1| 

\[PnU < \[fFD{Hn)U + 6 < C C""!'-^! + <5. 

Finally, recall that in an orthogonal projector no entry can exceed unity in absolute 



value. With this in mind, (8.7) and (8.8) readily follow 



8.4. Additional bounds. Theorems |8.1| and 8^ rely on Bernstein's result on 
best polynomial approximation. Following the same argument, one may derive decay 
bounds for the density matrix from any other estimate on the best polynomial ap- 
proximation error for classes of functions that include the Fermi-Dirac function. For 
instance, consider the following result of Achieser (see [55J Thm. 78], and Ij): 

Theorem 8.8. Let the function f be analytic in the interior of the ellipse £^. 
Suppose that |Re/(z)| < 1 holds in £^ and that f{z) is real for real z. Then the 
following hound holds: 

4 °° (—1)" 

^^'^^^ - ^ ^ {2v + 1) cosh((2;/ + l)(fc + 1) Inx) ' ^^'^^ 



The series in (8.9) converges quite fast; therefore, it suffices to compute a few 



terms explicitly to obtain a good approximation of the bound. A rough estimate 



shows that, in order to approximate the right hand side of (8.9) within a tolerance r, 
one may truncate the series after vq terms, where r'^" < r(l — r) and r — x ^• 

Observe that, as in Bernstein's results, there is again a degree of arbitrariness in 
the choice of x- However, the admissible range for x is smaller here because of the 
hypothesis |Re/(z)| < 1. 

The resulting matrix decay bounds have the form 

\[fFn{H^)U < - E (2. + l)cosh((2. + l)(d(^,,) + l)l^x) ^''''^ 

for the case of general sparsity patterns. While these bounds are less transparent 
than those derived from Bernstein's Theorem, they are computable. We found that 
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the bounds (8.10) improve on (8.1 1 for entries close to the main diagonal, but do not 
seem to have a better asymptotic behavior. A possibility would be to combine the 
two bounds by taking the smaller between the two values. 

So far we have only considered bounds based on best approximation of analytic 
functions defined on a single interval. In [61], Hasson has obtained an interesting 
result on polynomial approximation of a step function defined on the union of two 
symmetric intervals. Let a, 6 G M with < a < 6 and let sgn(x) be the sign function 
defined on [—6, —a] U [a, 6], i.e., sgn(x) = —1 on [—6, —a] and sgn(a;) — 1 on [a,b]. 
Notice that the sign function is closely related to the Heaviside function h{x), since 
we have h{x) = |(1 + sgn(a::)). 



Proposition 8.9. There exists a positive constant K such that 

k 



Sfc(sgn;[-fe,-a]U[a,fe])<X- 



^fk 



.11) 



Given a sequence of Hamiltonians {Hn} with gapped spectra, one may choose 
a and h and shift i/„, if necessary, so that the spectrum of each i?„ is contained 
in [—6, —a] U [a, b] and the eigenvalues corresponding to occupied states belong to 
[—6, —a]. Then we obtain the following decay bound for the density matrix: 



\[Pn 



< K- 



-id(i,]) 



where ^ 



2 h 



a 



12) 



Under the bounded maximal degree condition, the rate of decay is independent of n. 



A few remarks on 
• Since 



12 ) are in order: 



.12 ) relies directly on a polynomial approximation of the step function, 
we do not need here the extra term 5 found in 



Unfortunately, it is not possible to assess whether (8.121 may be useful in 
practice without an explicit formula - or at least an estimate - for the constant 
K. The asymptotic decay rate, however, is faster than exponential and indeed 
faster than for other bounds; a comparison is shown in Fig. |8.9| (top). Notice 
that this logarithmic plot is only meant to show the slope of the bound (which 
is computed for K = 1). 
A disadvantage of (8.12) is the requirement that the intervals containing the 



spectra cr{Hn) should be symmetric with respect to 0. Of course one may 
always choose a and b so that this hypothesis is satisfied, but the quality of 
the decay bound deteriorates if b (or —b) is not close to the maximum (resp., 
minimum) eigenvalue; see Fig. |8.9| (bottom). The blue curve shows the slope 
of the decay bound for a — 0.25 and 5 = 1, in a logarithmic scale. In green 
we display the behavior of the first row of the density matrix associated with 
a tridiagonal 100 x 100 matrix with spectrum in [—1,-0.25] U [0.25, 1]. The 
red plot refers to the first row of the density matrix associated with a matrix 
with spectrum in [—0.4375, — 0.25]U [0.25, 1]. The first matrix is clearly better 
approximated by the decay bound than the second one. 
As one can see from the two plots in Fig. 



8.9 even ioi c = K = 1 both types 



of decay bounds are rather conservative, and estimating the truncation bandwidth ifi 
needed to achieve a prescribed error from these bounds would lead to an overly large 
band. Hence, the bounds may not be very useful in practice. For further discussion 
of these issues, see section 18.9 
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Fig. 8.9. Top: logarithmic plot of Hassan (blue) and Bernstein-type (green) decay bounds, for 
a 100 X 100 tridiagonal matrix with spectrum in [—1, —0.25] U [0.25, 1]. The first row of the "exact" 
density matrix is plotted in red. Bottom: logarithmic plot of Hasson decay bounds (blue) and first 
rows of density matrices associated with matrices with different eigenvalue distributions (red and 
green). 



8.5. Further results. Let us assume again that we have a sequence {i/„} of 
Hermitian n x ji Haniiltonians (with n — U}, ■ n^, rif, fixed, n^ — >■ oo) such that 

• The matrices Hn are banded with uniformly bounded bandwidth, or sparse 
with graphs having uniformly bounded maximum degree; 

• the spectra a(Hn) are uniformly bounded; 

• the sequence {Hn} has a "stable" spectral gap, i.e., there exist real numbers 
9i < 52 such that [gi, (72] H cr(i?„) = for sufficiently large n. 

In this subsection we let 

• i-j. := {g2 + .gi)/2 (Fermi level), 

• J '■— 92 — fJ' — fJ- — gi (absolute spectral gap). 

Note that because of the uniformly bounded spectra assumption, the absolute spectral 
gap is within a constant of the relative gap previously defined. 

Chui and Hasson study in |29j the asymptotic behavior of the error of best 
polynomial approximation for a sufficiently smooth function / defined on the set 
/ = [—5, —a] U [a,b], with < a < 6. Denote as C{I) the space of real-valued con- 
tinuous functions on /, with the uniform convergence norm. Then we have (see |29| 
Thm. 1] and ^84j): 

Theorem 8.10. Let f G C{I) be .such that f\[-b,-a] ^^ the restriction of a function 
/i analytic on the left half plane Re z < and f\[aM ^^ ihe restriction of a function 
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/2 analytic on the right half plane Rez > 0. Then 



\imsup[Ekif,I)f' < 

where Ek{f, I) is the error of best polynomial approximation for f on I. 

The authors of [25] observe that the above resuh cannot be obtained by extending 
f{x) to a continuous function on [—b, b] and applying known bounds for polynomial 
approximation over a single interval. Theorem |8.10| looks potentially useful for our 
purposes, except that it provides an asymptotic result, rather than an explicit bound 
for each value of k. Therefore, we need to reformulate the argument in J29j. To this 



end, we prove a variant of Bernstein's Theorem (cf. Theorem 8.7) adapted to our 
goals. Instead of working on the interval [—1, 1] we want to bound the approximation 
error on the interval [a^,6^]. 

Theorem 8.11. Let f € C{[a'^,b'^]) be the restriction of a function f analytic in 
the interior of the ellipse Ea^y with foci in a^,b^ and a vertex at the origin. Then, 
for all ^ with 

_ r, JL h 

l<e<e:= 



5' 
there exists a constant K such that 

'1 



k 



Ek{f,[a\b^])<K 
Proof. The proof closely parallels the argument given in WB for the proof of 



Theorem 
in ±1 anc 



8.7 First of all, observe that the ellipse E^ in Bernstein's Theorem has foci 
vertices in ±(x + l/x)/2 and ±(x— l/x)/2. The parameter x is the sum of 
the lengths of the semiaxes. Similarly, the ellipse fa^.b^ has foci in a^, b^ and vertices 
in 0, a^ + b"^ and (a^ + 6^)/2 ± iab. Also observe that £, is the sum of the lengths of the 
semiaxes oi Ea^y, normalized w.r.t. the semifocal length, so that it plays exactly the 
same role as x for E^. Now we look for a conformal map that sends an annulus in the 
complex plane to the ellipse where / is analytic. When this ellipse is £^, a suitable 
map is u = c{v) = (v + l/w)/2, which sends the annulus x^^ < bl < X to E^. When 
the desired ellipse has foci in a^,b^, we compose c{v) with the change of variable 

thus obtaining a function that maps the annulus A = {£,~^ < \v\ < ^} to an ellipse. 
Denote this ellipse as i^a^.b^^^ and observe that it is contained in the interior of fa^.fes. 
Therefore we have that the function 



/(V(c(«))) = / 



1 / 1 



2 V V b^ 



b^ 



is analytic on A and continuous on \v\ = ^. The proof now proceeds as in the original 
Bernstein Theorem. The Laurent expansion 



oo 



f{ip{c{v))) = } ^ a^v 
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converges in A with a_y = a,y. Moreover, we have the bound 



1 
27ri 



fWciv))) 



\v\=i 



.,l'+l 



dv 



< 






where Af (^) is the maximum value (in modulus) taken by / on the ellipse £^^2^,2^^. 

Now observe that u = c{v) describes the real interval [—1, 1] for \v\ = 1, so for 
u e [—1, 1] we have 



f{ip{u)) =ao + 2^at,Ti,{u), 



where T^{u) is the i/-th Chebyshev polynomial. Since V'(") is a linear transforma- 
tion, we have Ek{f{z), [a^, 6^]) = Ek{f{u), [—1, 1]), so from the theory of Chebyshev 
approximation [92j we obtain 



Ekif da', b']) ^ Ekifiu), [-1,1]) <2M{0 E ^' 






hence the thesis. Note that the explicit value of K is computable. D 
The following result is based on [5S1 Thm. 1]. 



Theorem 8.12. Let f € C{I) be as in Theorem 8.10 Then, for all ^ with 

. - a + b 

l<e<e:= r, 

a ~ b 

there exists C > independent of k such that 

Ek{f,i)<cr^- 



Proof Let Pj, and Qk be polynomials of best uniform approximation of degree k 
on the interval [a^,6^] for the functions /2(v^) ^nd f2{\/x)/\/x, respectively. Then 



by Theorem 8.11 there are constants Ki and K2 such that 



and 



max |P,(x)-/2(VS)|<ifir' 



inax \Qk{x) - f2{V^)/V^\ < K^C"" ■ 



.13) 



.14) 



We use the polynomials Pk and Qk to define a third polynomial i?2fc+i(a;) :— [Pk{x'^) + 
a;Qfe(a;^)]/2, of degree < 2A;-|-1, which approximates f{x) on [a, 6] and has small norm 
on [—6, —a]. Indeed, from (8.13) and (8.141 we have: 



max \R2k+i(x) - f{x)\ < - max \Pkix'^) ~ f{x)\ + - max \xQkix'^) - f{x)\ 

x^[a,o\ Z x£[a,o\ Z x^[a,o\ 



- l^'^^' + 1^^'^^' 



K, + bK2 ^_k 



.15) 
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and 

max |i?2fe+i(2:)| < I max \Pkix^) - fix) + f{x) - xQk{x'')\ (8.16) 
< \ max |P,.(x2) - !{x)\ + \ max ^Q^ix") - /(x)| < ^l±^^-\ (g.ir) 
Similarly, we can find another polynomial 5*2^+1 (a;) such that 

max |52.+i(x) - !{x)\ < ^^l±^^^k (g^^g) 

£C^ — fc^ — a Z 



and 



max \S2k+i{x)\ < ^3 + &^4 ^-fc ^g^^g^ 

xtE[a,oJ Z 



Then, from the inequalities (8.15l-(8.19) we have 

max|i?2fc+i(a::) + S'2fe+i(a;) - /(x)| < max |i?2fe+i(a;) - /(a;)| + max |S'2fe+i(a;)| 

xel xe[a,b] xe[a,b] 

+ max |S'2fe+i(a;) - /(x)| + max |i?2A;+i(a;)| 

x£[—b, — a] x^[—b, — a] 

< {Ki +K3 + b{K2 + Ki))r'' , 
and therefore 

Ek{fJ)<Vl{Ki+K-i + b{K2 + Ki))c'^: 
for odd values of k, and 

EkifJ) < aKi +Ks + b{K2 + Ki))Ci 

for even values of k. This completes the proof. D 

In the following we assume, without loss of generality, that k is odd. In order to 
obtain bounds on the density matrix, we apply Theorem |8.12| to the step function / 
defined on / as follows: 

., , J 1 for —h < X < —a 
J{x) = |o for a<x<b, 

i.e., / is the restriction of fi{z) = 1 on [—b,—a] and the restriction of f2{z) = on 
[a,b]. Here the polynomial approximation of f2{y/x), J2{,\f£)l \fx and /i(-\/— a?) is 
exact, so we have K\ = K2 = K-^ = 0. As for ii'4, observe that |l/-\/z| achieves its 
maximum on the vertex oiEa?}^^^ with smallest abscissa; therefore we have 

where 



M{i) = with zo = 



2(^+^7 ' b^ 



— a 
1 
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Moreover, we find R2k+i{x) = and S2k+i{x) = (1 + xVk{x'^))/2, where Vk{x) is the 
polynomial of best uniform approximation for 1/y/x on [a^,6^]. Thus, we obtain the 
bound 



where C is given by 



Ek{f,I)<CC 



C^^Kib. 



Let us now apply this result to our sequence of Hamiltonians. We will assume 
that the matrices are shifted so that /x = 0, that is, we replace each H„ by iJ„ — M^n- 
Under this hypothesis, the natural choice for a is a = 7, whereas b is the smallest 
number such that a{Hn) C [—6, —a] U [a, b] for every n. 

Using the same argument used in section |8.3| for the derivation of matrix decay 
bounds (see also [T^] and [H]), we can obtain bounds on the off-diagonal entries of 
f{Hn). If Hn is banded with bandwidth 771 independent of n, we have 



\[PnU = \[f{H^)U < V^^^^r^: 



whereas if _ff„ has a more general sparsity pattern we obtain 

\[PnU 



i[/(ff„)]..i<v^^&r 



.20) 



.21) 



where dn{i,j) is the distance between nodes i and j in the graph Gn associated with 
Next, we compare the bounds derived in this section with those for the Fermi- 



Dirac approximation of the step function obtained in section 8.1 using a suitable 
choice of the inverse temperature /3. Recall that if £^ denotes the regularity ellipse 
for the Fermi-Dirac function, the earlier bounds for the banded case are: 



|[^, 



71 It J 



< 



2M(x) n 



X 



1 



.22) 



For ease of computation, we assume in this section that /i = and that the spectrum of 
each matrix iJ„ is contained in [—1, 1]. As explained in section 8.1 once 7 is known, 
we pick a tolerance S and compute /? so that the Fermi-Dirac function provides a 
uniform approximation of the step function with error < 6 outside the gap: 



/3 > - In 

7 



l-S 



Then the supremum of the set of admissible values of x, which ensures optimal asymp- 
totic decay in this framework, is 



X = 



V^ 



//3. 



Figures 



.10 



and 



8.11 



compare the values of 1/ ^ and 1/ x (which characterize 



the behavior of the bounds (8.201 and (8.22), respectively). Note that in general we 



find 1/^ < 1/x; this means that the asymptotic decay rate is higher for the bound 
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Fig. 8.10. Comparison of parameters l/J and 1/x/or several values of the spectral gap. Here 
5 = 10-5. 




Fig. 8.11. Logarithmic plot of parameters 1/ g and 1/ x w.r.t. several values of the spectral gap. 
HereS = lO'^. 



based on disjoint interval approximation. Moreover, the disjoint interval method 
directly approximates the step function and therefore does not require one to choose 
a tolerance for "intermediate" approximation. As a result, the bounds based on 
disjoint interval approximation prescribe a smaller truncation bandwidth m in the 
approximation to the spectral projector in order to achieve a given level of error. For 
instance, in the tridiagonal case {m = 1) we observed a factor of three reduction in 
m compared to the previous bounds, independent of the size of the gap. 

8.6. Dependence of the rate of decay on the spectral gap. As already 
mentioned in section l4J the functional dependence of the decay length (governing the 
rate of decay in the density matrix) on the spectral gap has been the subject of some 
discussion; see, for instance, [21 [ZHl ESI 11041 11271 1140| . Some of these authors have 
argued that the decay length decreases like the square root of the gap if the Fermi 
level is located near one of the gap edges (i.e., close to either £„^ or to e„^+i), and 
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like the gap itself if the Fermi level falls in the middle of the gap. These estimates 
hold for the small gap limit. 

In this section we address this problem by studying how the decay described by 



the bounds ( 8.20 ) and ( 8.21 ) behaves asymptotically with respect to 7 or, equivalently, 
with respect to a (see the notation introduced in the previous section) . Note that we 
are assuming here that the Fermi level falls exactly in the middle of the gap. 



Let us rewrite (8.201 in the form 

|[P„]^,|<Ce^"l'-^'l/'", 
where 

a = - In £ = - In 

2^2 \b-a 

For a fixed m, the decay behavior is essentially described by the parameter a. Let us 
assume for simplicity of notation that 6 = 1, so that the spectral gap is normalized 
and the expression for a becomes 

1, fl + a 
a = - In 



2 \l-a 
The Taylor expansion of a for a small yields 

a = a+ — +o{a ) . 

o 

Therefore, for small values of 7, the decay behavior is described at first order by the 
gap itself, rather than by a more complicated function of 7. This result is consistent 
with similar ones found in the literature [70l |73l [140J . The fact that some systems 
exhibit density matrix decay lengths proportional to the square root of the gap (see, 
e.g., |73| ) does not contradict our result: since we are dealing here with upper bounds 
a square root-dependence, which corresponds to faster decay for small a, is still consis- 
tent with our bounds. Given that our bounds are completely general, it does not come 
as a surprise that we obtain the more conservative estimate among the alternatives 
discussed in the literature. 

8.7. Dependence of the rate of decay on the temperature. Another issue 
that has stirred some controversy in the literature concerns the precise rate of decay 
in the density matrix in metals at positive temperature; see, e.g., the results and 
discussion in [3l IMl |70| . Recall that in metals at positive temperatures T, the density 
matrix F„ — fpDiHn) decays exponentially. The question is whether the decay length 
is proportional to T or to Vt, for small T. Our approach shows that the decay length 
is proportional to T. 



Indeed, from the analysis in section 8.1 in particular Theorems |8.1| and 



we find that the decay length a in the exponential decay bound (8.1) (or, more 



generally, the decay length 9 in the bound (8.5)) behaves like Inx, where - assuming 



for simplicity that /i = 0, as before - the parameter x is any number satisfying 



Letting x ~ 7r//3 = ■nkgT and observing that for small x 
In I a; + v 1 + x^ ) — x + o{x ), 
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we conclude that, at low temperatures, the decay length is proportional to fc^T. This 
conclusion is in complete agreement with the results in [SH [7D] . To the best of our 
knowledge, this is the first time this result has been established in a fully rigorous 
and completely general manner. 

8.8. Other approaches. Decay bounds on the entries of spectral projectors can 
also be obtained from the contour integral representation 

^" " i / ^^^" " Hr^y^dz , (8.23) 

where F is a simple closed curve (counterclockwise oriented) in C surrounding a por- 
tion of the real axis containing the eigenvalues of iJ„ which correspond to the occupied 



states and only those. Componentwise, (8.23) becomes 

[Pnh = ^ / [(^^" - ^")"'],j dz, l<i,j<n, 
from which we obtain 



I [Pnh I < ^ / I [i^^n - H.^r'] ^^\dz, 1 < I, J < n. 

Assume the matrices i?„ are banded, with uniformly bounded spectra and bandwidths 
as n -> oo. By [34l Prop. 2.3] there exist, for all z E T, explicitly computable constants 
c(z) > and < A(z) < 1 (independent of n) such that 

|[(z/„-iJ„)-'],J<c(z)[A(z)]l^-^"l, (8.24) 

for all i,j = 1, . . . ,n. Moreover, c and A depend continuously on z G F. Since F is 
compact we can set 

c — inaxc(z) and A = maxA(z). (8.25) 

zer zer 

Now let us assume that the matrices iJ„ have spectral gaps 7„ satisfying inf„ 7„ > 0. 
It is then clear that c is finite and that A G (0,1). Hence, we obtain the following 
bound: 

|[P4.,|<(c.^)aI-^I, (8.26) 

for all i,j = l,...,n, where ^(F) denotes the length of F. Finally, letting C — c- -^ 
and a = — In A we obtain the exponential decay bounds 

|[Pn]^j|<C-e-"l*-^'l, l<t,j<n, (8.27) 

with both C > and a > independent of n. As usual, the bounds can be easily 
extended to the case of general sparsity patterns. One disadvantage of this approach 
is that explicit evaluation of the constants C and a is rather complicated. 

The integral representation (8.23 1 is useful not only as a theoretical tool, but also 
increasingly as a computational tool. Indeed, quadrature rules with suitably chosen 
nodes zi, . . . , z^ g F can be used to approximate the integral in (8.23), leading to 

fc 
P„«^w'^(^,/„-i?„)^' (8.28) 
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for suitable quadrature weights wi, . . .Wk- Note that this amounts to a rational ap- 
proximation of P„ = h{Hn). In practice, using the trapezoidal rule with a small 
number of nodes suffices to achieve high accuracy, due to the exponential convergence 
of this quadrature rule for analytic functions [3H|- Note that if P„ is real then it is 
sufficient to use just the Zi in the upper half-plane and then take the real part of the 
result [65! page 307]. If the spectral gap 7„ for Hn is not too small, all the resol- 
vents {ziln — Hn)^^ decay rapidly away from the main diagonal, with exponential 
rate independent of Uf.. Hence, 0{n) approximation is possible, at least in principle. 



Rational approximations of the type (8.28) are especially useful in those situations 



where only selected entries of P„ are required. Then only the corresponding entries 
of the resolvents (zi/„ — Hn)^^ need to be computed. For instance, in some cases 
only the diagonal entries of P„ are needed '116'. In others, only entries in positions 
corresponding to the non-zero entries in the Hamiltonian H^ must be computed; this 
is the case, for instance, when computing the objective function {E) — Tr(P„_ff„) in 
density matrix minimization algorithms. Computing selected entries of a resolvent is 
not an easy problem. However, progress has been made on this front in several recent 
papers; see, e.g., [781 [82l [Ml 11241 [T26] . 

8.9. Computational considerations. In the preceding sections we have rig- 
orously established exponential decay bounds for zero-temperature density matrices 
corresponding to finite-range Hamiltonians with non- vanishing spectral gap ('insula- 
tors'), as well as for density matrices corresponding to arbitrary finite-range Hamilto- 
nians at positive electronic temperatures. Our results are very general and apply to 
a wide variety of physical systems and discretizations. Hence, a mathematical justi- 
fication of the physical phenomenon of 'nearsightedness' has been obtained, and the 
possibility of 0{n) methods firmly establishedr] 

Having thus achieved our main purpose, the question remains whether our esti- 
mates can be of practical use in the design of 0{n) algorithms. As shown in section^ 
having estimated the rate of decay in the density matrix P allows one to prescribe a 
priori a sparsity pattern for the computed approximation P to P. Having estimated 
an 'envelope' for the non-negligible entries in P means that one can estimate before- 
hand the storage requirements and set up static data structures for the computation 
of the approximate density matrix P. An added advantage is the possibility of using 
the prescribed sparsity pattern to develop efficient parallel algorithms; it is well known 
that adaptive computations, in which the sparsity pattern is determined 'on the fly', 
may lead to load imbalances and loss of parallel efficiency due to the need for large 
amounts of communication and unpredictable memory accesses. This is completely 
analogous to prescribing a sparsity pattern vs. using an adaptive one when computing 
sparse approximate inverses for use as preconditioners when solving linear systems, 
see HHj. 

Most of the 0{n) algorithms currently in use consist of iterative schemes produc- 
ing increasingly accurate approximations to the density matrix. These approximations 
may correspond to successive terms in an expansion of P with respect to a prescribed 
basis [531 IHOl ISI] , or they may be the result of a gradient or descent method in density 
matrix minimization approaches [53J [53J [TSJ |23] • Closely related methods include pu- 
rification and algorithms based on approximating the sign function [95 ; we refer again 
to pOl l97l I113J for recent surveys on the state of the art of linear scaling methods for 



* Heuristics relating the "nearsightedness range of electronic matter" and the linear complexity 
of the Divide-and-Conquer method of Yang |138) . essentially a domain decomposition approach to 
DFT, were already given by Kohn himself; see, e.g., I75II105| . 
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electronic structure. Most of these algorithms construct a sequence of approximations 

p(0) p(l) p(fe) 

which, under appropriate conditions, converge to P. Each iterate is obtained from 
the preceding one by some matrix-matrix multiplication, or powering, scheme; each 
step introduces new nonzeros (fill-in), and the matrices P^'^^ become increasingly 
dense. The exponential decay property, however, implies that most of these nonzeros 
will be negligible, with only 0{n) of them being above any prescribed threshold S > 
0. Clearly, knowing a priori the location of the non-negligible entries in P can be 
used to drastically reduce the computational burden and to achieve linear scaling, 
since only those entries need to be computed. Negligible entries that fall within the 
prescribed sparsity pattern may be removed using a drop tolerance; this strategy 
further decreases storage and arithmetic complexity, but its implementation demands 
the use of dynamic data structures. 

An illustration of this use of the decay estimates can be found for instance in 
[13], where a Chebyshev expansion of the Fermi-Dirac function fpoiH) was used 
to approximate the density matrix at finite temperatures. Given a prescribed er- 
ror tolerance, exponential decay bounds were applied to the Fermi-Dirac function 
to determine the truncation bandwidth needed to satisfy the required approximation 
error. When computing the polynomial Pk{H) ~ fpoiH) using the Chebyshev ex- 
pansion, only entries within the prescribed bandwidth were retained. Combined with 
an estimate of the approximation error obtained by monitoring the magnitude of the 
coefficients in the Chebyshev expansion, this approach worked well for some simple ID 
model problems resulting in linear scaling computations. A related approach, based 
on qualitative decay estimates for the density matrix, was already used in |4]. These 
authors present computational results for a variety of ID and 2D systems including 
insulators at zero temperature and metals at finite temperature; see further |80j . 

Unfortunately, the practical usefulness of our bounds for more realistic calcula- 
tions is limited. The bounds are generally pessimistic and tend to be overly conserva- 
tive, especially for the case of zero or low temperatures. This is to be expected, since 
the bounds were obtained by estimating the degree of a polynomial approximation 
to the Fermi-Dirac matrix function needed to satisfy a prescribed error tolerance. 
These bounds tend to be rather pessimistic because they do not take into account 
the possibility of numerical cancellation when evaluating the matrix polynomial. For 
instance, the bounds must apply in the worst-case scenario where the Hamiltonian has 
non-negative entries and the approximating polynomial has nonnegative coefficients. 
Moreover, the bounds do not take into account the size of the entries in the Hamilto- 
nian, particularly the fact that the nonzeros within the band (or sparsity pattern) are 
not of uniform size but may be spread out over several orders of magnitude. It should 
be emphasized that the presence of a gap is only a sufficient condition for localization 
of the density matrix, not a necessary one: it has been pointed out, for example in 
[QOj , that disordered systems may exhibit strong localization even in the absence of a 
well-defined gap. This is the case, for instance, of the Anderson model of localization 
in condensed matter physics [2]. Obviously, our approach is unable to account for 
such phenomena in the zero temperature case. The theory reviewed in this paper is 
primarily a qualitative one; nevertheless, it captures many of the features of actual 
physical systems, like the asymptotic dependence of the decay rate on the gap size or 
on the electronic temperature. 

A natural question is whether the bounds can be improved to the point where 
they can be used to obtain practical estimates of the entries in the density matrix. In 
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order to achieve this, additional assumptions on the Hamihonians would be needed, 
making the theory less general. In other words, the price we pay for the generality of 
our theory is that we get pessimistic bounds. Recall that for a given sparsity pattern 
in the normalized Hamiltonians Hn our decay bounds depend on just one essential 
parameter, the gap 7. Our bounds are the same no matter what the eigenvalue 
distribution is to the left of the highest occupied level, e„^ , and to the right to the 
lowest unoccupied one, £„^+i. If more spectral information were at hand, the bounds 
could be improved. The situation is very similar to that arising in the derivation 
of error bounds for the convergence of Krylov methods, such as the CG method for 
solving symmetric positive definite linear systems Ax = 6; see, e.g., [571 Theorem 
10.2.6]. Bounds based on the spectral condition number K2{A) alone, while sharp, 
do not in general capture the actual convergence behavior of CG. They represent the 
worst-case behavior, which is rarley observed in practice. Much more accurate bounds 
can be obtained by making assumptions on the distribution of the eigenvalues of A. 
For instance, if A has only k distinct eigenvalues, then the CG method converges (in 
exact arithmetic) to the solution x* = A~^b in at most k steps. Similarly, suppose 
the Hamiltonian Hn has only k < n distinct eigenvalues (with /i not one of them), 
and that the multiplicities of the eigenvalues to the left of /i add up to n,, , the number 
of electrons. Then there is a polynomial Pfc(A) of degree at most fc — 1 such that 
Pk{Hn) — Pn, the density matrix. This is just the interpolation polynomial that takes 
the value 1 on the eigenvalues to the left of fj,, and zero on the eigenvalues to the 
right of fi. This polynomial "approximation" is actually exact. If A; <gC n, and is 
independent of n, then P„ will be a matrix with 0{n) nonzero entries; moreover, the 
sparsity pattern of P„ can be determined a priori from the graph structure of iJ„. 
Another situation is that in which the eigenvalues of iJ„ fall in a small number k of 
narrow bands, or tight clusters, with the right-most band to the left of /z well-separated 
from the left-most band to the right of 11. In this case we can find again a low-degree 
polynomial Pfc(A) with pi^(Hn) « Pn, and improved bounds can be obtained. 

The problem, of course, is that these are rather special eigenvalue distributions, 
and it is difficult to know a priori whether such conditions hold or not. 

Another practical issue that should be at least briefly mentioned is the fact that 
our bounds assume knowledge of lower and upper bounds on the spectra of the Hamil- 
tonians Hn , as well as estimates for the size and location of the spectral gap (this is 
also needed in order to determine the Fermi level fi). These issues have received a 
great deal of attention in the literature, and here we limit ourselves to observe that 
0{n) procedures exists to obtain sufficiently accurate estimates of these quantities; 
see, e.g., [S3]. 

9. Transformation to an orthonormal basis. In this section we discuss the 
transformation of an Hamiltonian from a non-orthogonal to an orthogonal basis. The 
main point is that while this transformation results in matrices with less sparsity, 
the transformed matrices retain the decay properties of the original matrices, only 
with (possibly) different constants. What is important, from the point of view of 
asymptotic complexity, is that the rate of decay remains independent of system size. 

We begin with a discussion of decay in the inverse of the overlap matrix. To this 
end, consider a sequence {S'„} of overlap matrices of size n = niy-rie, with rif, constant 
and rie increasing to infinity. We make the following assumptions: 

1. Each Sn is a banded symmetric positive definite (SPD) matrix with unit 
diagonal entries and with bandwidth uniformly bounded with respect to n; 

2. The spectral condition number (ratio of the largest to the smallest eigenvalue) 
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of each Sn, ii2iSn)i is uniformly bounded with respect to n. Because of 
assumption 1, this is equivalent to requiring that the smallest eigenvalue of 
Sn remains bounded away from zero, for all n. 

As always in this paper, the handedness assumption in item 1 is not essential 
and can be replaced by the weaker hypothesis that each Sn is sparse and that the 
corresponding graphs {Gn} have bounded maximal degree with respect to n. Actually, 
it would be enough to require that the sequence {Sn} has the exponential decay 
property relative to a sequence of graphs {G„} of bounded maximal degree. In order 
to simplify the discussion, and also in view of the fact that overlap matrices usually 
exhibit exponential or even super-exponential decay, we assume from the outset that 
each Sn has already been truncated to a sparse (or banded) matrix. Again, this 
is for notational convenience only, and it is straightforward to modify the following 
arguments to account for the more general case. On the other hand, the assumption 
on condition numbers in item 2 is essential and cannot be weakened. 

Remark 9.1. We note that assumption 2 above is analogous to the condition 
that the sequence of Hamiltonians {Hn} has spectral gap bounded below uniformly in 
n; while this condition ensures (as we have shown) the exponential decay property in 
the associated spectral projectors Pn, assumption 2 above insures exponential decay 
in the inverses (or inverse factors) of the overlap matrices. Both conditions amount 
to asking that the corresponding problems be uniformly well- conditioned in n. The 
difference is that the decay on the spectral projectors depends on the spectral gap of 
the Hamiltonians and therefore on the nature of the system under study (i.e., insu- 
lator vs. metallic system), whereas the sparsity and spectral properties of the overlap 
matrices depend on other features of the system, mainly the inter-atomic distances. 

In the following we shall need some basic results on the decay of the inverses [M] , 
inverse Cholesky factors [T^ and inverse square roots (Lowdin factors) [12] of banded 
SPD matrices; see also [71] . 

Let A be SPD and m-banded, and let a and b denote the smallest and largest 
eigenvalue of A, respectively. Write k for the spectral condition number K2{A) of A 
(hence, k = b/a). Define 





q:=^,- ^ and A := qi/™ . 
\/k + 1 








Furthermore, let Kq ■ 


= (1 + ^f/{2b). In [53], Demko et al. 


obtained the following 


bound on the entries of A^^: 










\[A-%\ < K \\'~^\ , l<i,j<n, 






(9.1) 


where K := max{a~^ 


Kq). Note that the bound (9.1) 'blows 


up' 


as K — > c». 


as one 


would expect. 











As shown in [15], the decay bound (9.1) and the handedness assumption on A 
imply a similar decay bound on the inverse Cholesky factor Z = R~^ = L~ , where 
A — R^R — LL^ with R upper triangular {L lower triangular). Assuming that A has 
been scaled so that maxi<i<„ An — 1 (which is automatically true if A is an overlap 
matrix corresponding to a set of normalized basis functions), we have 

\Z^,\<Ki\=-\ j>t, (9.2) 

with Ki = K ^^^^ ; here K, A are the same as before. We further note that while 
Ki > K, for some classes of matrices it is possible to show that the actual magnitude 
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of the {i,j) entry of Z (as opposed to the bound (9.2)) is actually less than the 
magnitude of the corresponding entry of A^^. This is true, for instance, for an 
irreducible M-matrix; see |15) . 

Finally, let us consider the inverse square root, A^^^"^. In [T^] the following bound 
is established: 

[A-i/2]„|<if2Al'-^'l, l<ij<n. (9.3) 

Here K2 depends again on the extreme eigenvalues a and b of A, whereas A — q^/™, 
where now q is any number satisfying the inequalities 

< q <1. 



1 



As before, the bound (9.3) blows up as k -^ cx), as one would expect. 

Introducing the positive scalar a = —In A, we can rewrite all these bounds in the 
form 

\B,j\<Ke-"^'-^K l<i,j<n 

for the appropriate matrix B and suitable constants K and a > 0. 

Let now {S'„} be a sequence of n x n overlap matrices, where n — ni, ■ n^ with 
rif, fixed and rig — )■ 00. Assuming that the matrices Sn satisfy the above assumptions 
1-2, then their inverses satisfy the uniform exponential decay bounds ( |9.1| , with K 
and A constant and independent of n. Hence, as discussed in section [61 for any given 
e > there exists an integer fh independent of n such that each matrix Sn in the 
sequence can be approximated, in norm, by an m-banded matrix with an error less 
than e. As usual, this result can be extended from the banded case to the sparse case, 
assuming that the corresponding graphs Gn have bounded maximal degree as n ^ 00. 
Moreover, under assumptions 1-2 above, the inverse Cholesky factors Z„ satisfy a 



uniform (in n) exponential decay bound of the type (9.2), and therefore uniform 
approximation with banded triangular matrices is possible. Again, generalization 
to more general sparsity patterns is possible, provided the usual assumption on the 
maximum degree of the corresponding graphs Gn holds. Similarly, under the same 
conditions we obtain a uniform rate of exponential decay for the entries of the inverse 

— 1/2 

square roots Sn , with a corresponding result on the existence of a banded (or 
sparse) approximation. 

Let us now consider the sequence of transformed Hamiltonians, ij„ = Z^HnZn- 
Here Z„ denotes either the inverse Cholesky factor or the inverse square root of the 
corresponding overlap matrix Sn- Assuming that the sequence {Hn} satisfies the off- 
diagonal exponential decay property and that {Sn} satisfies assumptions 1-2 above, it 
follows from the decay properties of the matrix sequence {Z„} that the sequence {Hn} 
also enjoys off-diagonal exponential decay. This is a straightforward consequence of 
the following result, which is adapted from a similar one for infinite matrices due to 
Jaffard \7T, Proposition 1]. 

Theorem 9.2. Consider two sequences {An} and {-B„} ofnx n matrices (where 
n — > 00 j whose entries satisfy 

IK]„-|<cie-"l*-J"l and |[B„],,| < C2 e-"!'-^"! , l<i,j<n, 

where Ci, C2 and a > are independent of n. Then the sequence {C„}, where Cn = 
AnBn, satisfies a similar hound: 

|[C„],,|<ce-"'l^-^"l, l<i,j<n, (9.4) 
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for any < a' < a, with c independent ofn. 

Proof. First note that the entries of each A„ clearly satisfy 

WAihjl < ci e""'l'"-''l for any a' < a . 

Let oj = a — a'. Then a; > and the entries [Cn]ij of C„ ~ AnBn satisfy 

\[Cn]ij\ < Yl \[An]tk\ \[Bn]kj\ < C1C2 I ^C ^'^ ■" | C 
fe=l \*: = 1 

To complete the proof just observe that for any j, 

n j—1 n—j 00 00 

j2 e-"i^-j"i = Y. ^""' + E ^^"' < E ^~"' + E 

k=l fc=0 fc=l fe=0 fc=l 



e 



.k_ 1+e- 



1-e- 



Since the last term is independent of n, the entries of C„ satisfy (9.4 1 with a constant 
c that is also independent of n. D 

The foregoing result can obviously be extended to the product of three matrices. 
Thus, the entries of the matrix sequence {Hn}, where ff„ = Z^HnZn, enjoy the 
exponential off-diagonal decay property: 



i-tlnlij 



<ce-"l'-^l, l<i,j<n, 



for suitable constants c and a > 0. 

Alternatively, one could first approximate i7„ and Z„ with banded matrices 
Hn and Z^ and then define the (approximate) transformed Hamiltonian as _ff„ := 
Z^HnZn, possibly subject to further truncation. Using the fact that both H^ and 
Zn have 2-norm bounded independently of n, it is easy to show that the final approx- 
imation error can be reduced below any prescribed tolerance by reducing the error 
in Hn and Zn- Hence, with either approach, the transformed Hamiltonians H„ can 
be approximated uniformly in n within a prescribed error by banded matrices of con- 
stant bandwidth, just like the original ("non-orthogonal") Hamiltonians. While the 
bandwidth of the approximations will be larger than for the original Hamiltonians, 
the truncated matrices retain a good deal of sparsity and asymptotically contain 0{n) 
non-zeros. Hence, we have a justification of the statement (see section [I]) that in our 
theory we can assume from the outset that the basis set {4>i}f^i is orthonormal. 

In Fig. |9.1| we show the Hamiltonian H for the already mentioned linear alkane 



C52H106 (see section 8.2 1 discretized in a Gaussian-type orbital basis (top) and the 
'orthogonalized' Hamiltonian H — Z^HZ (bottom). This figure shows that while 
the transformation to orthogonal basis alters the magnitude of the entries in the 
Hamiltonian, the bandwidth of H (truncated to a tolerance of 10~*) is only slightly 
wider than that of H . In this case the overlap matrix S is well-conditioned, hence the 
entries of Z exhibit fast decay. An ill-conditioned overlap matrix would lead to a less 
sparse transformed Hamiltonian H. 

As usual, the handedness assumption was made for simplicity of exposition only; 
similar bounds can be obtained for more general sparsity patterns, assuming the 
matrices Hn and Sn have the exponential decay property relative to a sequence {Gn} 
of graphs having maximal degree uniformly bounded with respect to n. 

It is important to emphasize that in practice, the explicit formation of iJ„ from 
Hn and Z„ is not needed and is never carried out. Indeed, in all algorithms for 
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Fig. 9.1. Magnitude of the entries in the Hamiltonian for the C52H106 linear alkane. Top: 
non-orthogonal (GTO) basis. Bottom: orthogonal basis. White: < 10~*; yellow: 10~® — 10~^; 
green: 10~^ — 10^''; blue: 10"* — 10~^; black: > 10"'^. Note: nz refers to the number of 'black' 
entries. 



electronic structure computation the basic matrix operations are matrix-matrix and 
matrix-vector products, which can be performed without expHcit transformation of 
the Hamiltonian to an orthonormal basis. On the other hand, for the study of the 
decay properties it is convenient to assume that all the relevant matrices are explicitly 
given in an orthogonal representation. 

One last issue to be addressed is whether the transformation to an orthonormal 
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basis should be effected via tlie inverse Cfiolesky factor or via tlie Lowdin (inverse 
square root) factor of the overlap matrix. Comparing the decay bounds for the two 
factors suggests that the inverse Cholesky factor should be preferred (smaller a). 
Also note that the inverse Cholesky factor is triangular, and its sparsity can be in- 
creased by suitable reorderings of the overlap matrix. The choice of ordering may 
also be influenced by the computer architecture used. We refer to [30] for the use of 
bandwidth-reducing orderings like reverse Cuthill-McKee, and to |25] for the use of 
space-filling curve orderings like the 3D Hilbert curve to improve load balancing and 
data locality on parallel architectures. In contrast, the Lowdin factor is a full symmet- 
ric matrix, regardless of the ordering. On the other hand, the multiplicative constant 
c is generally smaller for the Lowdin factor. Closer examination of a few examples 
suggests that in practice there is no great difference in the actual decay behavior of 

— 1/2 

these two factors. However, approximating Sn is generally more expensive and 
considerably more involved than approximating the inverse Cholesky factor. For the 
latter, the AINV algorithm TS^ and its variants [211 11101 I136J are quite efficient and 
have been successfully used in various quantum chemistry codes. For other 0{n) al- 
gorithms for transformation to an orthonormal basis, see [7^ 11001 1122] . In all these 
algorithms, sparsity is preserved by dropping small entries in the course of the com- 
putation. Explicit decay bounds for the Z„ factors could be used, in principle, to 
establish a priori which matrix elements not to compute, thus reducing the amount of 
overhead. Notice, however, that even if asymptotically bounded, the condition num- 
bers K2{Sn) can be fairly large, leading to rather pessimistic decay estimates. This 
is again perfectly analogous to the situation with the condition number-based error 
bounds for the conjugate gradient (CG) method applied to a linear system Ax = b. 



And indeed, both the CG error bounds and the estimates (9.1) are obtained using 



Chebyshev polynomial approximation for the function /(A) = A ^. 

10. The vanishing gap case. In this section we discuss the case of a sequence 
{Hn} of bounded, finite range Hamiltonians for which the spectral gap around the 
Fermi level /i vanishes as n — > oo. Recall that this means that inf„7„ — 0, where 
7„ := e„ ^_i — £nj is the HOMO-LUMO gap for the n-th Hamiltonian; it is assumed 

here that e„" < ii < £,j +i for all n — nt ■ rie. The reciprocal 7~^ of the gap can 
be interpreted as the condition number of the problem [109j , so a vanishing spectral 
gap means that the conditioning deteriorates as rig —>■ oo and the problem becomes 
increasingly difficult. 

As already mentioned, in the zero-temperature limit our decay bounds blow up, 
and therefore lose all meaning as 7„ — > 0. On the other hand, we know a priori that 
some type of decay should be present, in view of the results in section [7] A general 
treatment of the vanishing gap case appears to be rather difficult. The main reason is 
that in the limit as /3 — >■ cxd the Fermi-Dirac approximation to the Heaviside function 
becomes discontinuous, and therefore we can no longer make use of tools from classical 
approximation theory for analytic functions. Similarly, in the vanishing gap case the 



decay bounds (^8.27| based on the resolvent estimates (8.241 break down since c — J- oo 



and A ^- 1 in (^8.26 



Rather than attacking the problem in general, in this section we give a complete 
analysis of what is perhaps the simplest nontrivial example of a sequence {Hn} with 
vanishing gap. While this is only a special case, this example captures some of the 
essential features of the 'metallic' case, such as the rather slow off-diagonal decay 
of the entries of the density matrix. The simple model studied in this section may 
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appear at first sight to be too simple and unrealistic to yield any useful information 
about actual physical systems. However, calculation of the density matrix at zero 
temperature on a system composed of 500 Al atoms reported in [140] reveals a decay 
behavior which is essentially identical to that obtained analytically for a free electron 
gas, a model very close to ours (which is essentially a discrete variant of the one in 
[140] ). We believe that our analysis will shed some light on more general situations 
in which a slowly decaying density matrix occurs. 

We begin by considering the infinite tridiagonal Tocplitz matrix 
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H 



\ 
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(10.1) 
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which defines a bounded, banded, self-adjoint operator on ^^. The graph of this 
matrix is just a (semi-infinite) path. The operator can be interpreted as an averaging 
operator, or as a centered second-difference operator with a zero Dirichlet condition 
at one end, shifted and scaled so as to have spectrum contained in [—1,1]. From a 
physical standpoint, H is the shifted and scaled discrete one-electron Hamiltonian 
where the electron is constrained to the half-line [0, oo). 

For n even {n = 2 ■ n^, with rie G N) consider the n-dimensional approximation 



Hn — 
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(10.2) 



This corresponds to truncating the semi-infinite path and imposing zero Dirichlet 
conditions at both ends. Let now {ei, 62, . . .} denote the standard basis of £^, and let I 
denote the identity operator restricted to the subspace oil? spanned by e„_|_i, e„+2, ■ • ■• 
Letting 



^(n) :- 







the sequence {_//(•„■)} is now a sequence of bounded self-adjoint linear operators on (^ 
that converges strongly to H. Note that cr(-ff„) C [—1,1] for all n; also, ^ cr(i?n) 
for all even n. It is easy to see that half of the eigenvalues of i/„ lie in [—1, 0) and the 
other half in (0, 1]. We set /i = and we label as 'occupied' the states corresponding 



-(") 



J") 

^/2• 



to negative eigenvalues. The spectral gap of each _ff„ is then e„ /j+i 

The eigenvalues and eigenvectors of H^ are known explicitly |35l Lemma 6.1]. 
Indeed, the eigenvalues, in descending order, are given by e]^ = cos ( ^^ 1 (with 1 < 

k < n) and the corresponding normalized eigenvectors are given by wj^" — (wj," (j)) 
with entries 



i"\j)- 



sm 



jkiT 
n+l 



I <j <n. 
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Note that the eigenvalues are symmetric with respect to the origin, and that the 
spectral gap at vanishes, since e„'/2 1 1 — ^£„/2 ~^ as n — ^ cx). We also point to 
the well known fact that the eigenvectors of this operator are strongly delocalized. 
Nevertheless, as we will see, some localization (decay) is present in the density matrix, 
owing to cancellation (i.e., destructive interference). 

Now let P„ be the zero-temperature density matrix associated with i/„, i.e., 
the spectral projector onto the subspace of C" spanned by the eigenvectors of 7J„ 
associated with the lowest Ug eigenvalues (the occupied subspace). We extend P„ to 
a projector acting on P by embedding P„ into an infinite matrix P(n) as follows: 

p ._{ Pn 

nn) — y 

Note that P(„) is just the orthogonal projector onto the subspace of i"^ spanned by 
the eigenvectors of H(n) associated with eigenvalues in the interval [—1,0). Moreover, 
Tr(P(„-)) = Tr(P„) = rank(P„) — ^ ~ n^. The limiting behavior of the sequence 
{P{n)} (hence, of {Pn}) is completely described by the following result. 

Theorem 10.1. Let H, Hn, and P(^n) be as described above. Then 
(i) H has purely absolutely continuous spectrumrj given by the interval [—1,1]. 

In particular, H has no eigenvalues. 

(ii) The union of the spectra of the n-dimensional sections iJ„ of H is everywhere 

dense in cr{H) = [—1, 1]. In other words, every point in [—1, 1] is the limit of 

a sequence of the form {ff^" } for n — >■ oo, where Sj^ G a{II„) and k — k{n). 

(Hi) The sequence {Hn} has vanishing gap: inf „ 7„ = 0. 

(iv) The spectral projectors P(„) converge strongly to P = h{II) where h{x) = 

X[_i.o)(a;); the characteristic function of the interval [—1,0). 
(v) P is the orthogonal projector onto an infinite- dimensional subspace of (? . 
Proof. Statements (i)-(ii) are straightforward consequences of classical results on 
the asymptotic eigenvalue distribution of Toeplitz matrices, while (iv)-(v) follow from 
general results in spectral theory. Statement (iii) was already noted (the eigenvalues 
of Hn are explicitly known) and it also follows from (i)-(ii). More in detail, statement 
(i) is a special case of Rosenblum's Theorem on the spectra of infinite banded Toeplitz 
matrices: see |108| or [T71 Thm. 1.31]. For the fact that the spectrum of H coincides 
with the interval [—1,1] and that the finite section eigenvalues ej," are dense in a{H) = 
[—1, 1] (statement (ii)), see the paper by Hartman and Wintner [SD] or the book by 
Grenander and Szego [Ml Chapter 5]. Statement (iv) can be proved as follows. For a 
linear operator A on £^, write Rx{A) = {A— XI)~^, with A ^ <^{A). A sequence {An} 
of self-adjoint (Hermitian) operators is said to converge in the strong resolvent sense 
to A if R\{An) — > R\{A) strongly for all A e C with Re A 7^ 0, that is: 

lim \\Rx{An)x - Rx{A)x\\ ^ for aU x e l"^ . 



>oo 



It is easy to check, using for instance the results in [T51 Chapter 2] , that the sequence 
{Hn} converges in the strong resolvent sense to H. Statement (iv) (as well as (ii)) 
now follows from [Wl Thm. VIII.24]. The fact (v) that P = h{H) is an orthogonal 



^ The absolutely continuous spectrum of a self-adjoint linear operator H on a Hilbert space Jif is 
the spectrum of the restriction of H to the subspace ,i^c Q -^' of vectors ^ whose spectral measures 
[i^ are absolutely continuous with respect to the Lebesgue measure. For details, see [1071 pages 

224-231]. 
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projector onto an infinite-dimensional subspace of (."^ follows from the fact that /i = 
is not an eigenvalue of H (because of (i)) and from the spectral theorem for self-adjoint 
operators in Hilbert space; see, e.g., |1071 Chapter VII] or |115l Chapter 12]. D 

The foregoing result implies that the Toeplitz matrix sequence {Hn\ given by 



( 10.2 ) exhibits some of the key features of the discrete Hamiltonians describing metal- 



lic systems, in particular the vanishing gap property and the fact that the eigenvalues 
tend to fill the entire energy spectrum. The sequence {H^} can be thought of as a 
ID 'toy model' that can be solved analytically to gain some insight into the decay 
properties of the density matrix such systems. Indeed, from the knowledge of the 
eigenvectors of i/„ we can write down the spectral projector corresponding to the 
lowest Uf, — n/2 eigenvalues explicitly. Recalling that the eigenvalues e^" are given in 
descending order, it is convenient to compute P„ as the projector onto the orthogonal 
complement of the subspace spanned by the eigenvectors corresponding to the n/2 
largest eigenvalues: 



P„ = /„-^4")(4")) 



k=l 



The (i, j) entry of P„ is therefore given by 

2 



[-'njzj — ^i ^n^j 



fc=i 



n+l 



jkn 
n + l 



For i = j , we find 



[^.].. = i-;^E' 



fc=i 



ik-K 
n + l 



for alH = 1, . . . , 71 and for all n. (10.3) 



Hence, for this system the charge density Pa is constant and the system essentially 
behaves like a non-interacting electron gas, see for example [SU]. We note in passing 



that this example confirms that the bound (7.2) is sharp, since equality is attained 

(10.4) 



for this particular projector. Moreover, the trigonometric identity 
. „ . , 1 



sm f sm ( 



implies for all i, j = 1, . . . , n: 



1 



[''"|«^;;ttS 



k=l 



[cos(6' + 0) - cos(6i - (/>)] 



[i + j)kT:\ f (i — j^ki: 

cos I ^ -h- - cos ' ^ ■^' 



n + l 



n+l 



From (10.5) it immediately follows, for all i and for all n, that 

[Pn]^,^+2l=Q. 1 = 1,2,... 



(10.5) 



(10.6) 



Since (10.31 and (10.61 hold for all n, they also hold in the limit as n — >■ oo. Hence, 



the strong limit P of the sequence of projectors {P(„)} satisfies Pa = 1/2 and Pij = 
for all j = i + 21, where i,l = 1,2,... To determine the remaining off-diagonal entries 
Pij (with j ^ i and j ^ i + 21) we directly compute the linfit of [Pn]ij as n — > oo, as 
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follows. Observe that using the substitution x = k/{n + 1) and taking the limit as 
n -^ cx) in ( 10.5 ) we obtain for alH > 1 and for all j y^ i + 21 (^ = 0, 1, . . .): 



P — 



cos[{i + j)Trx] dx — / cos[{i — j)TTx]dx 

"'0 



(-1)' 



(-1)' 



i + j 



1-- 3 



(10.7) 



It follows from (10.71 that \Pij\ is bounded by a quantity that decays only linearly in 



the distance from the main diagonal. As a result, 0{n) approximation of P„ for large 
n involves a huge prefactor. Therefore, from this very simple example we can gain 
some insight into the vanishing gap case. The analytical results obtained show that 
the density matrix can exhibit rather slow decay, confirming the well known fact that 
0{n) approximations pose a formidable challenge in the vanishing gap case. 

The 2D case is easily handled as follows. We consider for simplicity the case of a 
square lattice consisting of n^ points in the plane. The 2D Hamiltonian is given by 



H„ 



1 



{H„ 



Hn 



where the scaling factor ^ is needed so as to have cr(iJ„2) c [—1, 1]. The eigenvalues 
and eigenvectors of -ff„2 can be explicitly written in terms of those of -ff„; see, e.g., 
[55] . Assuming again that n is even, exactly half of the n^ eigenvalues of iJ„2 are 
negative (counting multiplicities) , the other half positive. As before, we are interested 
in finding the spectral projector associated with the eigenvectors corresponding to 
negative eigenvalues. Note again that the spectral gap tends to zero as n ^- oo. If 
P„2 denotes the spectral projector onto the occupied states, it is not difficult to show 
that 



P„2 = P„ ® (/„ - P„) + (/„ - P„) ® P„ 



(10.8) 



It follows from (10.8) that the spectral projector P„ 
where 



has a natural n x n block 



structure, 

• Each diagonal block is equal to ^/„; note that this gives the correct trace, 
Tr(P„2) = ^. 

• The {k,l) off-diagonal block 11^; is given by Hki — [Pn]ki{In — 2Pn)- Hence, 
each off-diagonal block has a 'striped' structure, with the main diagonal as 
well as the third, fifth, etc. off-diagonal identically zero. Moreover, every 
block life; with I ~ k + 2m {m > 1) is zero. 

This shows that in the 2D case, the rate of decay in the spectral projector is 
essentially the same as in the ID case. The 3D case can be handled in a similar 
manner, leading to the same conclusion. 

For this simple example we can also compute the entries of the density matrix at 
positive electronic temperature T > 0. Recalling that the density matrix in this case 
is given by the Fermi-^Dirac function with parameter /3 = l/(kBT) we have in the ID 
case (assuming /^ = 0) 



P^,= 



" sm ^ 
"^ fe=i 1 + exp 



(a^)-n(a^) 



/3cos 



(fe) 



(ia9) 
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Making use again of the trigonometric identity (10.4) and using the same substitution 
x = fc/(n + 1), we can reduce the computation of the density matrix element _Py for 
n ^- oo to the evaluation of the following integral: 



1 + exp (/3 cos TTx) 

Unfortunately, this integral cannot be evaluated explicitly in terms of elementary 
functions. Note, however, that the integral 

, cos {k-Kx) 
h- = I TT^ r ax 



/o 1 + exp (/? cos TTx) 
(where k is an integer) becomes, under the change of variable -kx — arccost, 



1 /■ cos (fc arccost) dt 



Hence, up to a constant factor, Ik is just the fcth coefficient in the Chebyshev expansion 
of the Fermi-Dirac function 1/(1 + e*^*). Since the Fermi-Dirac function is analytic on 
the interior of an ellipse containing the interval [—1, 1] and continuous on the boundary 
of such an ellipse, it follows from the general theory of Chebyshev approximation that 
the coefficients Ik decay at least exponentially fast &s k ^ cx); see, e.g, [51]. This 



in turn implies that the entries Pij given by (10.10 1 decay at least exponentially fast 
away from the main diagonal, the faster the larger the temperature is, as already 
discussed in section |8.7| Hence, for this special case we have established in a more 
direct way the exponential decay behavior already proved in general in section |8.1[ 
In the present case, however, for any value of /3 the decay rate of the entries Pij given 



by (10.10) can be determined to arbitrary accuracy by numerically computing the 
Chebyshev coefficients of the Fermi-Dirac function. 

We mention that a simple, one-dimensional model of a system with arbitrarily 
small gap has been described in [1^. The (continuous) Hamiltonian in [i^ consists 
of the kinetic term plus a potential given by a sum of Gaussian wells located at the 
nuclei sites Xi: 

with a > and cr > tunable parameters. The spectra of this family of Hamiltonians 
present a band structure with band gap proportional to ^/a/a. Note that the model 
reduces essentially to ours for a — > and/or for cr —>■ oo. On the other hand, while 
the gap can be made arbitrarily small by tuning the parameters in the model, for any 
choice of a > and cr > the gap does not vanish; therefore, no approximation of the 
infinite-size system with a sequence of finite-size ones can lead to a vanishing gap in 
the thermodynamic limit. This means that our bounds, when applied to this model, 
will yield exponential decay, albeit very slow (since the correlation lengths will be 
quite large for small a — > and/or for large a) . The model in |49| , on the other hand, 
can be useful for testing purposes when developing algorithms for metal-like systems 
with slowly decaying density matrices. 

11. Other applications. In this section we sketch a few possible applications 
of our decay results to areas other than electronic structure computations. 
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11.1. Density matrices for thermal states. In quantum statistical meclian- 
ics, tlie equilibrium density matrix for a system of particles subject to a heat bath at 
absolute temperature T is defined as 

P= , where Z = Tr(e"'^^). (11.1) 

As usual, /3 = {kBT)~^ where fes denotes the Boltzmann constant; see |102j . The 
matrix P is the quantum analog of the canonical Gibbs state. The Hamiltonian 
H is usually assumed to have been shifted so that the smallest eigenvalue is zero 
[571 page 112]. Note that P as defined in ( |11.1| is not an orthogonal projector. It is, 
however, Hermitian and positive semidefinite. Normalization by the partition function 
Z ensures that a{P) C [0, 1] and that Tr(P) = 1. 

It is clear that for increasing temperature, i.e., for T — >■ oo (equivalently, for 
/3 — ?► 0) the canonical density matrix P approaches the identity matrix, normalized by 
the matrix size n. In particular, the off-diagonal entries tend to zero. The physical 
interpretation of this is that in the limit of large temperatures the system states 
become totally uncorrelated. For temperatures approaching the absolute zero, on the 
other hand, the canonical matrix P tends to the orthogonal projector associated with 
the zero eigenvalue (ground state). In this limit, the correlation between state i and 
state j is given by the {i,j) entry of the orthogonal projector onto the eigenspace 
corresponding to the zero eigenvalue, normalized by n. 

For finite, positive values of T, the canonical density matrix P is full but decays 
away from the main diagonal (or, more generally, away from the sparsity pattern of 
H). The rate of decay depends on f3: the smaller it is, the faster the decay. Application 
of the bounds developed in sectionlSlto the matrix exponential is straightforward. For 
instance, the bounds based on Bernstein's Theorem take the form 



e ---|,:,|<C(/3)e-"'^(''^"), i^j, (11.2) 



where 

C(^) = ^2^0/9 ll^lb(«i-i)/2 and « = 21nx. 

In these expressions, x > 1 and ki > 1 are the parameters associated with the 
Bernstein ellipse with foci in —1 and 1 and major semi-axis ki, as described in section 
|8| Choosing x large makes the exponential term decay e""'''-*'-'^ very fast, but causes 



C(/3) to grow larger. Clearly, a smaller /3 makes the upper bound (11.21 smaller. 
Bounds on the entries of the canonical density matrix P can be obtained dividing 
through the upper bounds by Z. Techniques for estimating Z can be developed using 
the techniques described in [55]; see also [TT] . 



Although the bound ( 11.2 ) is an exponentially decaying one, it can be shown that 
the decay in the entries of a banded or sparse matrix is actually SMper-exponential. 
This can be shown by expanding the exponential in a series of Chebyshev polynomials 
and using the fact that the coefficients in the expansion, which can be expressed in 
terms of Bcssel functions, decay to zero super-exponentially; see [55] and also [55] • 
The decay bounds obtained in this way are, however, less transparent and more 



complicated to evaluate than (11.2). 

Finally, exponential decay bounds for spectral projectors and other matrix func- 
tions might provide a rigorous justification for 0{n) algorithms recently developed for 
disordered systems; see, e.g., [11711118] . 
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11.2. Quantum information theory. A related area of research where our 
decay bounds for matrix functions have proven useful is the study of quantum many- 
body systems in information theory; see, e.g., [3T1 [351 [3S1 11201 1121] . In particular, 
relationships between spectral gaps and rates of decay for functions of finite range 
Hamiltonians have been established in [31] using the techniques introduced in [12]. 
The exponential decay of correlations and its relation to the spectral gap have also 
been studied in [^2, 63 . 

As shown in |32j . exponential decay bounds for matrix functions play a crucial 
role in establishing so-called area laws for the entanglement entropy of ground states 
associated with bosonic systems. These area laws essentially state that the entangle- 
ment entropy associated with a 3D bosonic lattice is proportional to the surface area, 
rather than to the volume, of the lattice. Intriguingly, such area laws are analogous to 
those governing the Beckenstein-Hawking black hole entropy. We refer the interested 
reader to the recent, comprehensive survey paper [38] for additional information. 

11.3. Complex networks. The study of complex networks is an emerging field 
of science currently undergoing vigorous development. Researchers in this highly in- 
terdisciplinary field include mathematicians, computer scientists, physicists, chemists, 
engineers, neuroscientists, biologists, social scientists, etc. Among the mathematical 
tools used in this field, linear algebra and graph theory, in particular spectral graph 
theory, play a major role. Also, statistical mechanics concepts and techniques have 
been found to be ideally suited to the study of large-scale networks. 

In recent years, quantitative methods of network analysis have increasingly made 
use of matrix functions. This approach has been spearheaded in the works of Estrada, 
Rodriguez- Velazquez, D. Higham and Hatano; see, e.g., [3211101111111211131111], as 
well as the recent surveys [15] JH] and the references therein. Functions naturally 
arising in the context of network analysis include the exponential, the resolvent, and 
hyperbolic functions, among others. Physics-based justifications for the use of these 
matrix functions in the analysis of complex networks have been thoroughly discussed 
in [44: . 

For example, the exponential of the adjacency matrix A associated with a simple, 
undirected graph G = {V, E) can be used to give natural definitions of important 
measures associated with nodes in G, such as the subgraph centrality associated with 
node j, defined as C{i) = [e J^j, and the communicahility associated with two dis- 
tinct nodes i and j, defined as G{i,j) = [e ]ij. Other network quantities that can 
be expressed in terms of the entries in appropriate matrix functions of A include be- 
tweenness, returnability, vulnerability, and so forth. The graph Laplacian L = D — A, 
where D = diag((ii, . . . ,dn) with di denoting the degree of node i, is sometimes used 
instead of the adjacency matrix, as well as weighted analogues of both A and L. 

Most networks arising in real- world applications are sparse, often with degree dis- 
tributions closely approximated by power laws. Because the maximum degree in such 
"scale-free" networks increases as the number of nodes tends to infinity, one cannot 
expect uniform exponential decay rates to hold asymptotically for the matrix func- 
tions associated with such graphs unless additional structure is imposed, for instance 
in the form of weights. Nevertheless, our bounds for the entries of functions of sparse 
matrices can be used to obtain estimates on quantities such as the communicahility 
between two nodes. A discussion of locality (or the lack thereof) in matrix functions 
used in the analysis of complex networks can be found in [HI. We also refer the 
reader to |llj for a description of quadrature rule-based bounds for the entries of 
matrix functions associated with complex networks. 
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11.4. Tridiagonal eigensolvers. The solution of symmetric tridiagonal eigen- 
value problems plays an important role in many field of computational science. As 
noted for example in [130' , solving such problems is key for most dense real symmetric 
(and complex Hermitian) eigenvalue computations and therefore plays a central role 
in standard linear algebra libraries such as LAPACK and ScaLAPACK. Even in the 
sparse case, the symmetric tridiagonal eigenvalue problem appears as a step in the 
Lanczos algorithm. 

The efficiency of symmetric tridiagonal eigensolvers can be significantly increased 
by exploiting localization in the eigenvectors (more generally, invariant subspaces) 
associated with an isolated cluster of eigenvalues. It would be highly desirable to 
identify beforehand any localization in the eigenspace in a cost-effective manner, as 
this would lead to reduced computational costs |101l 1130] . It is clear that this prob- 
lem is essentially the same as the one considered in this paper, with the additional 
assumption that the matrix H is tridiagonal. Given estimates on the location of the 
cluster of eigenvalues and on the size of the gaps separating it from the remainder of 
the spectrum, the techniques described in this paper can be used to bound the entries 
in the spectral projector associated to the cluster of interest; in turn, the bounds can 
be used to identify banded approximations to the spectral projectors with guaranteed 
prescribed error. Whether the estimates obtained in this manner are accurate enough 
to lead to practical algorithms with substantially improved run times and storage 
demands over current ones remains an open question for further research. 

Finally, in the recent paper |139j the exponential decay results in |12] are used 
to derive error bounds and stopping criteria for the Lanczos method applied to the 
computation of e~*"^v, where A is a large symmetric positive definite matrix, w is a 
vector, and t > 0. The bounds are applied to the exponential of the tridiagonal matrix 
Tk generated after k steps by the Lanczos process in order to obtain the approximation 
error after k steps. 

11.5. Non-Hermitian extensions. Although the main focus of the paper has 
been the study of functions of sparse Hermitian matrices, many of our results can be 
extended, under appropriate conditions, to non-Hermitian matrices. The generaliza- 
tions of our decay bounds to normal matrices, including for example skew-Hermitian 
ones, is relatively straightforward; see, e.g., the results in [H] and [106] . Further gen- 
eralizations to diagonalizable matrices have been given in |14j . although the bounds 
now contain additional terms taking into account the departure from normality. These 
bounds may be difficult to use in practice, as knowledge of the eigenvectors or of the 
field of values of the matrix is needed. Bounds for functions of general sparse ma- 
trices can also be obtained using contour integration; see, e.g., [106] and [HI]. It is 
quite possible that these bounds will prove useful in applications involving functions 
of sparse, non-normal matrices. Examples include functions of digraphs in network 
analysis, like returnability, or functions of the Hamiltonians occurring in the emerging 
field of non-Hermitian quantum mechanics; see, respectively, [15] and [51 [51 [51]. 

12. Conclusions and open problems. In this paper we have described a gen- 
eral theory of localization for the density matrices associated with certain sequences 
of banded or sparse discrete Hamiltonians of increasing size. We have obtained, un- 
der very general conditions, exponential decay bounds for the off-diagonal entries 
of zero-temperature density matrices for gapped systems ('insulators') and for den- 
sity matrices associated with systems at positive electronic temperature. The theory, 
while purely mathematical, recovers well-known physical phenomena such as the fact 
that the rate of decay is faster at higher temperatures and for larger gaps, and even 
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captures the correct asymptotics for small gaps and low temperatures. Thus, we have 
provided a theoretical justification for the development of 0(n) methods for electronic 
structure computations. As an integral part of this theory, we have also surveyed the 
approximation of rapidly decaying matrices by banded or sparse ones, the effects of 
transforming a Hamiltonian from a non-orthogonal to an orthogonal basis, and some 
general properties of orthogonal projectors. 

In the case of zero-temperature and vanishing gaps, our bounds deteriorate for 
increasing n. In the limit as n -^ cx) we no longer have exponentially decaying bounds, 
which is entirely consistent with the physics. For metallic systems at zero temperature 
the decay in the spectral projector follows a power law, and we have exhibited a simple 
model Hamiltonians for which the decay in the corresponding density matrix is only 
linear in the distance from the main diagonal. 

Because of the slow decay, the development of 0{n) methods in the metallic 
case at zero temperature is problematic. We refer the reader to [51 [121 [HI 1134) for 
some attempts in this direction, but the problem remains essentially open. In the 
metallic case it may be preferable to keep P in the factorized form P = XX* , where 
X E C"X"e is any matrix whose columns span the occupied subspace, and to seek a 
maximally localized X. Note that 

p = xx* = {xu){xuy 

for any unitary Ue x Ue matrix [/, so the question is whether the occupied subspace 
admits a set of basis vectors that can be rotated so as to become as localized as 
possible. Another possibility is to research the use of rank-structured approximations 
(such as hierarchical matrix techniques [53]) to the spectral projector. Combinations 
of tensor product approximations and wavelets appear to be promising. We refer here 
to [SS] for a study of the decay properties of density matrices in a wavelet basis (see also 
[119] ^ and to [TS] for an early attempt to exploit near low-rank properties of spectral 
projectors. See also the more recent works by W. Hackbusch and collaborators [26l 
[2Ill2HlllIlll8ll86]. 

Besides the motivating application of electronic structure, our theory is also ap- 
plicable to other problems where localization plays a prominent role. We hope that 
this paper will stimulate further research in this fascinating and important area at 
the crossroads of mathematics, physics, and computing. 
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